Ext JS: Speech Recognition Wrapper
No matter, that’s more flexible anyway. Based on this conclusion, I dove into the API and created an Ext JS wrapper that supports interactions with the Web Speech API. You can try out an example, and grab the source on GitHub.
About the API
The API has a fair amount to it, so I’ll highlight some of the configuration options, as well as describe a bit about the results, which can be a bit confusing.
- continuous: If false, you get a one-shot stab at capturing audio. Once the audio is no longer detected, the capture automatically ends. If true, you can “continuously” capture audio, even if no audio is detected (such as in a pause for a breath). This allows you to conceivably record indefinitely, although Chrome apparently caps the total recordable duration to 60 seconds. So no support for novel transcription yet
- interimResults: If true, the recognition service will return interim results while audio capture is still occurring. This is nice if you’d like to give visual feedback of the capture in progress. If false, only the final recognition capture result will be returned
- maxAlternatives: This defines the maximum number of alternatives that are returned per recognition result. Since each alternative is ranked by confidence level (see below), it’s probably not terrifically useful to return more than 1 alternative, but could be interesting from a nerd perspective to analyze additional alternative recognition results.
- minimumConfidenceLevel: While not a part of the Chrome API itself, this local configuration variable allows you to filter out results based on a minimum confidence level. I’d suggest leaving it at 0 or .5 at a maximum in order to provide fastest feedback
- logFinalOnly: Another local config, this tells the wrapper whether or not it should process and log any results that are “final” or not
- chainTranscripts: Also a local config, this instructs the wrapper whether or not it should chain final transcript results together. For example, if you want to do 5 60-second sessions in succession, you could set chainTranscripts to true in order to end with a final, single transcript from all 5 sessions. If false, you’d be left with a transcript of the last-captured recognition session.
The result event is interesting because it has a lot going on. Whenever you receive a result from the recognition service, you receive the following:
- SpeechRecognitionResultList: A list of all recognition results returned for the current capture.
- Each List contains N SpeechRecognitionResult objects. Each SpeechRecognitionResult is either final or not (in the case of interim results)
- Each SpeechRecognitionResult is composed of N SpeechRecognitionAlternatives, which is where the transcripts to the audio capture are stored. Each SpeechRecognitionAlternative is composed of two properties: confidence and transcript. The confidence property indicates how sure the recognition service is that the transcript correlates to the captured audio. Helpfully, the recognition service returns the highest confidence SpeechRecognitionAlternative as the first in the array that are returned.
While all of this is nice to know, the Ext JS wrapper takes care of all of this for you. While you can configure it to handle results in some different ways (see Configuration above), out of the box you don’t need to mess with the technical details of the recognition service’s results at all.
Unless, of course, you want to. To help illuminate what’s happening, the wrapper includes some logging. Whenever a result is returned from the recognition service, a snapshot of each SpeechRecognitionAlternative is logged to a store internal to the wrapper class. If you want to see the results which have been logged to the store, simply call getResults() and you can interact with them just like you would any other Ext JS Store.
If you are interested, the wrapper also keeps track of timings of 4 areas of the interactions with the recognition service:
- sound: When sound is detected
- speech: When speech is detected
- audio: When audio capturing is occurring
- overall: Duration between start and end events
To get the durations for these timings, simply call getSoundDuration(), getSpeechDuration(), getAudioDuration(), or getDuration(), respectively.
Ok, now for the caveats
- Chrome only! The Web Speech APi is only currently implemented in webkit, so tough luck if you want to use it in Firefox, IE, etc.
- The Web Speech API is still being developed, so there’s no telling if the spec will change, Chrome’s implementation will change, or both. This was really just for the fun of experimentation, so I would strongly suggest against using this for anything real, unless you are willing to support whatever changes need to be made if and when the implementation changes.
- Grammars/Lang/ServiceURI: You’ll notice these configuration options in the wrapper. Currently, there is no support for these built in to the wrapper. They are more for the purpose of stubbing out future implementation to support these aspects of the Web Speech API.
Despite the lack of implementation of the Web Speech API in current browsers, I think there are some really cool things coming that will leverage it once it has broader adoption. I hope this wrapper is interesting, if for nothing else than providing a demo of something that will be a reality in the not-to-distant future.
As always, I appreciate any constructive feedback, so please let me know what you think in the comments!
No trackbacks yet.
about 5 years ago - 1 comment
Just a quick note–if you didn’t see, Google Chrome has revamped the display for your personal app dashboard. You can now re-order apps by drag-n-drop, rather than installing and uninstalling in the right order. Check it out
about 8 years ago - 3 comments
I've read alot of articles on the interblog in which people claim that Google's new browser, Chrome, will hurt Firefox, rather than Internet Explorer. The biggest reason for this, the protagonists claim, is that given that Internet Explorer users are the least likely to change their browsers, the majority of acceptance of Chrome will come…