Some Comments on #Siri
Wednesday, November 2, 2011 at 7:30AM
Paul Gvildys in Siri, Technology, iPhone

I have been using Apple's Siri quite a bit in the past while. It's extremely useful. With this entry I want to talk about a unique view people working in my field might have, some things Siri does well, and some things Apple needs to add in subsequent updates.

My View
For those who don't know, I work in telecommunications. Specifically on Voice Over IP technologies (think the telephone over the Internet like Skype or Rogers Home Phone). Even more specifically on systems that allow for human interaction with automated systems using your dial pad (DTMFs) and yes, your voice.

Voice interaction systems use something called automated speech recognition (ASR) to figure out what exactly it is you're saying. There used to be - and depending on who you ask, there still are - a number of competitors in this field, including IBM, but basically Nuance is the most dominant company in the market today.

It's pretty well understood that Apple's Siri is using licensed Nuance technology on the back end to perform the actual speech recognition. But having worked with Nuance for several years now, Nuance on it's own isn't as accurate as you might think, and never gives me results as accurate as Siri seems to provide. But then Siri itself seems to be performing some magic. To explain, I am going to quickly point form how I think Siri is working:

For those wondering WHY Siri needs to be connected to he 'net, remember Watson from Jeopardy, and how much CPU was needed there? Yes, Siri is much more simple than Watson, but add in ASR, and the needed processing power increases.

I have just explained how (I think) Siri works, and this isn't too different from how the software I work with operates, only the software I work with sends the audio you're saying IN REAL TIME to the recognition server, which has advantages - Hotword recognition, for example, where you want to be able to keep taking, and only when you say a "magic word", a certain action occurs - but doesn't work well when data is being sent over potentially unreliable data connections, as is the case with your mobile phone (hence Siri's packaging of your audio before sending it over the net).

That said, there's an important thing Siri needs to be able to do (which I will list again in my "Things Siri Needs To Do" section): Barge in. Barge in is when, in the middle of being played a prompt (eg, "what would you like to say to Katharine?"), you say something to stop the prompt, because you know what is expected next. Having to wait for the Siri tone slows me down. The phone itself already does amplitude detection to determine if you have stopped speaking, why not use amplitude detection to determine if you have started speaking (our software does amplitude detection for various operations to determine when you start and stop speaking, so it is possible for apple to do this).

I just wanted to get my thoughts out on this since I had been thinking about it for a while,

Things Siri Is Awesome At
Quick point form things Siri is awesome at:

Things Siri Needs To Do
More point form stuff:

Summary
Been finding Siri to be more useful than I thought I would, but if Apple adds the above functionality, people are going to be blown AWAY by how they'll be able to use Siri more efficiently than they thought possible, and it will take the application over the top.

L8r
Paul

Update on Wednesday, November 2, 2011 at 5:48PM by Registered CommenterPaul Gvildys

Fixed a bunch of typos.

Article originally appeared on Paul Gvildys' Blog (http://pgvildys.squarespace.com/).
See website for complete article licensing information.