At WWDC (the annual Apple developers' conference), Apple announced something which might be full command-and-control speech recognition for the Mac at last, for the first time.[1] None of the regular tech journalists are asking the questions I desperately want to know, however.
Most of my questions boil down to this:
Because how Apple answers that first question will inform the answers to all these details:
( For context, the answers to these questions for DNS and WSR )
What other questions do people have?
( Endnotes )
Most of my questions boil down to this:
How much did the Apple developers and designers of this product work with users of Dragon NaturallySpeaking for Windows (DNS), DragonDictate for Mac (DD), and Windows Speech Recognition (WSR)?
How much did they learn about what the speech recognition community already expects as a minimal baseline, as well as what speech recognition users have been lacking in our current tools?
Because how Apple answers that first question will inform the answers to all these details:
- Will this allow complete hands-free command and control? In other words, will users be able to control their computer without a mouse, a keyboard, a virtual keyboard, a switch, or mouse emulation?
- Will it give access to the menus, graphical icons, or any other aspects of the standard OS X desktop chrome, as long as the code is written using Apple standards?
- How will it work with tools that are not natively enabled to use it? For example, if I install an application that runs in a virtual machine (eg. Eclipse or Slack), what aspects of this speech recognition will be available and what won't?
- Will it require the cloud or network access to work?
- Will it have a trainable voice model?
- Will it have a configurable vocabulary?
- Will it be programmable, either with simple macros or with complex third-party tools?
- In what languages will it be available?
- Will the mobile version require a physical trigger to access, as with the built in microphone-icon-to-dictate currently available on iOS? Can it be left on all the time?
- How will the privacy be guaranteed for any always-listening aspects?
- Does it integrate with Apple VoiceOver?
( For context, the answers to these questions for DNS and WSR )
What other questions do people have?
( Endnotes )