Ads
Contact
This form does not yet contain any fields.
    RSS
    Search
    Social Links
    Navigation
    Powered by Squarespace
    Twitter
    Instagram
    Wednesday
    Nov022011

    Some Comments on #Siri

    I have been using Apple's Siri quite a bit in the past while. It's extremely useful. With this entry I want to talk about a unique view people working in my field might have, some things Siri does well, and some things Apple needs to add in subsequent updates.

    My View
    For those who don't know, I work in telecommunications. Specifically on Voice Over IP technologies (think the telephone over the Internet like Skype or Rogers Home Phone). Even more specifically on systems that allow for human interaction with automated systems using your dial pad (DTMFs) and yes, your voice.

    Voice interaction systems use something called automated speech recognition (ASR) to figure out what exactly it is you're saying. There used to be - and depending on who you ask, there still are - a number of competitors in this field, including IBM, but basically Nuance is the most dominant company in the market today.

    It's pretty well understood that Apple's Siri is using licensed Nuance technology on the back end to perform the actual speech recognition. But having worked with Nuance for several years now, Nuance on it's own isn't as accurate as you might think, and never gives me results as accurate as Siri seems to provide. But then Siri itself seems to be performing some magic. To explain, I am going to quickly point form how I think Siri is working:

    • Siri prompts you for some information. This sometimes done with a question, and always done with the familiar Siri "ding".
    • When you stop speaking (and Siri does some amplitude/volume detection to determine this) or you push the button to indicate that you have stopped speaking, Siri packs up the audio it has recorded from you, and sends it out over the 'net to Apples servers.
    • The first part of Siri's magic happens here: Siri provides Nuance with the audio capture, and also provides a bunch of intelligently constructed "grammars" for Nuance to work with to figure out what you want. It's also possible that Siri is just providing Nuance with a bunch of context specific dictionary words, since it's quite good at figuring out what you're saying, and also pretty good at using your information to improve things (without fail, it spells "Katharine" in my Katharine's unique way of spelling it).
    • The next part of the Magic is what Siri does after Nuance has processed your recording, and provides results with percent accuracies (like, how likely you said, "bore" vs "boar") and other things, and then figures out from this information the most likely thing you tried to say. It also parses the result, and does some magic to intepret what you were trying to ask it to do.
    • Siri then feeds the results back to your phone, asks you for additional information, rinse, lather, repeat.

    For those wondering WHY Siri needs to be connected to he 'net, remember Watson from Jeopardy, and how much CPU was needed there? Yes, Siri is much more simple than Watson, but add in ASR, and the needed processing power increases.

    I have just explained how (I think) Siri works, and this isn't too different from how the software I work with operates, only the software I work with sends the audio you're saying IN REAL TIME to the recognition server, which has advantages - Hotword recognition, for example, where you want to be able to keep taking, and only when you say a "magic word", a certain action occurs - but doesn't work well when data is being sent over potentially unreliable data connections, as is the case with your mobile phone (hence Siri's packaging of your audio before sending it over the net).

    That said, there's an important thing Siri needs to be able to do (which I will list again in my "Things Siri Needs To Do" section): Barge in. Barge in is when, in the middle of being played a prompt (eg, "what would you like to say to Katharine?"), you say something to stop the prompt, because you know what is expected next. Having to wait for the Siri tone slows me down. The phone itself already does amplitude detection to determine if you have stopped speaking, why not use amplitude detection to determine if you have started speaking (our software does amplitude detection for various operations to determine when you start and stop speaking, so it is possible for apple to do this).

    I just wanted to get my thoughts out on this since I had been thinking about it for a while,

    Things Siri Is Awesome At
    Quick point form things Siri is awesome at:

    • Reading text messages.
    • Writing text messages (especially love the punctuation, ellipsis, emoticon, and correct spelling of Katharine).
    • Setting reminders.
    • Setting alarms.
    • Finding and playing my awkwardly named playlists (Top Rated Non-Symphonic).

    Things Siri Needs To Do
    More point form stuff:

    • Support barge in (nothing is more annoying than knowing what say next, and having to wait, or making a mistake and wanting to correct it immediately and having to wait).
    • Support Email reading. I don't know what Siri is using for Text to Speech (TTS) (possibly Nuance again), but there is a standard way to support email reading, so why isn't it done?
    • Read reminders.
    • Read appointments.
    • Read anything in the notification centre.
    • Read any alert that pops up.
    • Support email writing without having to manually enter into the email.
    • Support all sorts of email operations like Reply All, or Forward.

    Summary
    Been finding Siri to be more useful than I thought I would, but if Apple adds the above functionality, people are going to be blown AWAY by how they'll be able to use Siri more efficiently than they thought possible, and it will take the application over the top.

    L8r
    Paul

    Wednesday
    Sep282011

    #TIFF11 Day 9

    A Canadian war hero!

    Billy Bishop Goes to War - 3.5 / 5
    I went into thinking it would be a film adaptation of the musical of the same name, but it turns out that it's a filming of a staging of the musical. I love the performances in it, and some of the shots are well done, but I couldn't help but think that seeing it live would be MUCH better. Eric Peterson is fantastic in it, and it was great to see him in person for the Q&A after the show. It was a great way to end the festival.

    Final thoughts: It was my best festival for overall film quality. I had some real hits with You're Next, Juan of the Dead and Goon, but I regret not seeing The Raid. ALSO, I swear TIFF was reading my tweets to them. From making lines when I asked, letting me into theaters when I asked, and especially adding the fan to the AMC 6 / 7 room, I SWEAR that they must have been reading those tweets.

    Tuesday
    Sep272011

    #TIFF11 Day 8

    Katharine understands some Bermese!

    The Lady - 2.5 / 5
    An important story to tell about the political situation in Berma, and with a great cast of actors, this film was beautifully shot with great colours, but was way too long. That, and there was a lot unexplained: When did the general leading the military change? When did she get put back on house arrest? What were the monks doing when they were marching through the streets at the end of the film? When the more interesting story is about her husband, something went wrong with this film.

    My final day at TIFF11!

    Monday
    Sep262011

    #TIFF11 Day 7

    Japanese manga made live.

    Smuggler - 2 / 5
    The action genre these days requires a combination of sat paced action and the occasional slow motion sequence. This film only had the slow motion part. Which cool for a bit, gets old quickly, I almost fell asleep. I assume that this was a faithful adaptation of the manga, with live scenes that looked very much like they were lifted straight from the manga, but it wasn't my cup of tea. Highlight was when the projector was broken before the film, and Bobcat Goldthwait performed his standup act for 20 minutes to entertain the audience.

    More films I saw with Katharine upcoming.

    Sunday
    Sep252011

    #TIFF11 Day 6

    My least favourite film of the festival and my most artsy film of the festival.

    The Moth Diaries - 0.5 / 5
    Katharine won tickets to this film, but couldn't attend due to work. I could attend, so I did. A vampire movie without fangs... really? This film had bad acting, storytelling, scenes with singing that did not mesh with anything else in the film, a hackneyed teacher / student romance that didn't make sense when it happened, resulting in laughs from the audience, that the director did not intend to get. It had awkward brief nudity. And moths made no sense in relation to vampires. I didn't walk out because I wanted to know hownit would end at least, but then it didn't really end. It was a really bad movie.

    Heleno - 3.5 / 5
    A black-and-white shot biopic on the Brazilian footballer who was driven mad by, and then died from, syphilis, the cinematography was beautiful, and the lead actor was fantastic. It's a dark film, but then his life was dark. The film bounces back and forth from an asylum he is in near the end of his life, and the height of his athletic career. The steady decline of his life tells a perfect tragedy. A great film to see at the festival, since I am not sure where else I would have had a chance to see it.

    Next, my first almost-fell-asleep film of the festival.