Ads
Contact
This form does not yet contain any fields.
    RSS
    Search
    Social Links
    Navigation
    Powered by Squarespace
    Twitter
    Instagram

    Entries in iPhone (2)

    Wednesday
    Nov022011

    Some Comments on #Siri

    I have been using Apple's Siri quite a bit in the past while. It's extremely useful. With this entry I want to talk about a unique view people working in my field might have, some things Siri does well, and some things Apple needs to add in subsequent updates.

    My View
    For those who don't know, I work in telecommunications. Specifically on Voice Over IP technologies (think the telephone over the Internet like Skype or Rogers Home Phone). Even more specifically on systems that allow for human interaction with automated systems using your dial pad (DTMFs) and yes, your voice.

    Voice interaction systems use something called automated speech recognition (ASR) to figure out what exactly it is you're saying. There used to be - and depending on who you ask, there still are - a number of competitors in this field, including IBM, but basically Nuance is the most dominant company in the market today.

    It's pretty well understood that Apple's Siri is using licensed Nuance technology on the back end to perform the actual speech recognition. But having worked with Nuance for several years now, Nuance on it's own isn't as accurate as you might think, and never gives me results as accurate as Siri seems to provide. But then Siri itself seems to be performing some magic. To explain, I am going to quickly point form how I think Siri is working:

    • Siri prompts you for some information. This sometimes done with a question, and always done with the familiar Siri "ding".
    • When you stop speaking (and Siri does some amplitude/volume detection to determine this) or you push the button to indicate that you have stopped speaking, Siri packs up the audio it has recorded from you, and sends it out over the 'net to Apples servers.
    • The first part of Siri's magic happens here: Siri provides Nuance with the audio capture, and also provides a bunch of intelligently constructed "grammars" for Nuance to work with to figure out what you want. It's also possible that Siri is just providing Nuance with a bunch of context specific dictionary words, since it's quite good at figuring out what you're saying, and also pretty good at using your information to improve things (without fail, it spells "Katharine" in my Katharine's unique way of spelling it).
    • The next part of the Magic is what Siri does after Nuance has processed your recording, and provides results with percent accuracies (like, how likely you said, "bore" vs "boar") and other things, and then figures out from this information the most likely thing you tried to say. It also parses the result, and does some magic to intepret what you were trying to ask it to do.
    • Siri then feeds the results back to your phone, asks you for additional information, rinse, lather, repeat.

    For those wondering WHY Siri needs to be connected to he 'net, remember Watson from Jeopardy, and how much CPU was needed there? Yes, Siri is much more simple than Watson, but add in ASR, and the needed processing power increases.

    I have just explained how (I think) Siri works, and this isn't too different from how the software I work with operates, only the software I work with sends the audio you're saying IN REAL TIME to the recognition server, which has advantages - Hotword recognition, for example, where you want to be able to keep taking, and only when you say a "magic word", a certain action occurs - but doesn't work well when data is being sent over potentially unreliable data connections, as is the case with your mobile phone (hence Siri's packaging of your audio before sending it over the net).

    That said, there's an important thing Siri needs to be able to do (which I will list again in my "Things Siri Needs To Do" section): Barge in. Barge in is when, in the middle of being played a prompt (eg, "what would you like to say to Katharine?"), you say something to stop the prompt, because you know what is expected next. Having to wait for the Siri tone slows me down. The phone itself already does amplitude detection to determine if you have stopped speaking, why not use amplitude detection to determine if you have started speaking (our software does amplitude detection for various operations to determine when you start and stop speaking, so it is possible for apple to do this).

    I just wanted to get my thoughts out on this since I had been thinking about it for a while,

    Things Siri Is Awesome At
    Quick point form things Siri is awesome at:

    • Reading text messages.
    • Writing text messages (especially love the punctuation, ellipsis, emoticon, and correct spelling of Katharine).
    • Setting reminders.
    • Setting alarms.
    • Finding and playing my awkwardly named playlists (Top Rated Non-Symphonic).

    Things Siri Needs To Do
    More point form stuff:

    • Support barge in (nothing is more annoying than knowing what say next, and having to wait, or making a mistake and wanting to correct it immediately and having to wait).
    • Support Email reading. I don't know what Siri is using for Text to Speech (TTS) (possibly Nuance again), but there is a standard way to support email reading, so why isn't it done?
    • Read reminders.
    • Read appointments.
    • Read anything in the notification centre.
    • Read any alert that pops up.
    • Support email writing without having to manually enter into the email.
    • Support all sorts of email operations like Reply All, or Forward.

    Summary
    Been finding Siri to be more useful than I thought I would, but if Apple adds the above functionality, people are going to be blown AWAY by how they'll be able to use Siri more efficiently than they thought possible, and it will take the application over the top.

    L8r
    Paul

    Monday
    Apr052010

    Marvel iPad/iPhone App

    Since the iPad was announced, I told people that the number one use for me would be the ability to read comics on it. It's about the right size for a comic, has decent storage capacity, and could save a lot of box / shelf space.

    Marvel seems to have agreed, and have released their iPad app (which also happens to work on the iPhone). Not having an iPad, my experience is purely limited to the iPhone.

    The app does a good job of presenting the comics. Browsing through issues is easy, and the developers seem to understand how reading on the iPhone should work. Selection is currently quite limited, but that's no different from their website selection.

    There is one major problem with the app: My Marvel Digital Comics Subscription (over $55 a year) does not apply to the Marvel iPhone app (despite logging into the app using the same account). So even though I am paying to view the digital content on my computer (in a Flash viewer) I would have to pay AGAIN to buy individual issues for the iPhone. I have no desire to do this.

    I have cancelled my subscription to the website, since, on it's own, it wasn't worth the cost when the comics were about 6 months behind the print copies. It was convenient that my subscription was set to expire shortly anyways.

    If Marvel brings a subsciption model to the iPhone app, and especially if they bring a subcription model that includes the latest releases, I will be back onboard, but until then, I guess I will continue to read the comics I already read (eg, no additional money from me for digital comics).

    L8r
    Paul