deborah: the Library of Congress cataloging numbers for children's literature, technology, and library science (Default)
deborah ([personal profile] deborah) wrote2019-06-03 06:01 pm

speech recognition at last? I have so many questions.

At WWDC (the annual Apple developers' conference), Apple announced something which might be full command-and-control speech recognition for the Mac at last, for the first time.[1] None of the regular tech journalists are asking the questions I desperately want to know, however.

Most of my questions boil down to this:

How much did the Apple developers and designers of this product work with users of Dragon NaturallySpeaking for Windows (DNS), DragonDictate for Mac (DD), and Windows Speech Recognition (WSR)?

How much did they learn about what the speech recognition community already expects as a minimal baseline, as well as what speech recognition users have been lacking in our current tools?


Because how Apple answers that first question will inform the answers to all these details:


  1. Will this allow complete hands-free command and control? In other words, will users be able to control their computer without a mouse, a keyboard, a virtual keyboard, a switch, or mouse emulation?

  2. Will it give access to the menus, graphical icons, or any other aspects of the standard OS X desktop chrome, as long as the code is written using Apple standards?

  3. How will it work with tools that are not natively enabled to use it? For example, if I install an application that runs in a virtual machine (eg. Eclipse or Slack), what aspects of this speech recognition will be available and what won't?

  4. Will it require the cloud or network access to work?

  5. Will it have a trainable voice model?

  6. Will it have a configurable vocabulary?

  7. Will it be programmable, either with simple macros or with complex third-party tools?

  8. In what languages will it be available?

  9. Will the mobile version require a physical trigger to access, as with the built in microphone-icon-to-dictate currently available on iOS? Can it be left on all the time?

  10. How will the privacy be guaranteed for any always-listening aspects?

  11. Does it integrate with Apple VoiceOver?


The answers to these same questions for Dragon NaturallySpeaking and Windows Speech Recognition, to provide some context for Mac folks:

I am positive of the accuracy of everything I say here about DNS. I am not completely positive that my answers about WSR are correct.[2]



  1. Yes, they offer complete command-and-control, for anything built as a native Windows app that is aware of MSAA. If an application is built for platform independence using non-native frameworks such as Electron or Java, it's theoretically possible that it could still have complete hands-free access, but it's rarely implemented right.[3]

  2. Yes to full menu and icon access, with the same caveat as above. Additionally, both WSR and DNS allow integration with third-party tools, and many of these third-party tools exist to increase access to elements of the desktop chrome.

  3. For non-native applications, DNS allows you to emulate any keyboard action, so as long as menus and buttons are keyboard enabled, they are easy to control. If they require a mouse, there is some ability to emulate mouse movement but it is slow and unpleasant. Third-party tools alleviate this somewhat, but not enough. I'm not sure about WSR.

  4. Neither DNS nor WSR require the cloud or network access to work. Nuance has been attempting to sell people on Dragon Anywhere, their cloud enabled product that shares certain vocabularies and settings across desktop and mobile devices. I don't know of anyone who uses DNS for accessibility who uses Dragon Anywhere, though, and I believe it is primarily being sold as a business tool.

  5. Yes, both have trainable voice models, although recent developments in speech recognition technology have diminished the need to train voice models.

  6. Yes, both allow users to add their own vocabularies.

  7. Yes. The more expensive editions of Dragon NaturallySpeaking are programmable inherently, but both DNS and WSR also include powerful APIs. There is a robust ecosystem of third-party tools around both WSR and DNS. Some are free, some cost money. Some are open source. They include simple, purpose-built tools to do one specific task, prebuilt sets of rich commands written for power users who aren't interested in doing a lot of their own configuration, libraries which can be used to write other third-party tools, and extremely flexible systems designed for hands-free developers.[4]

  8. Both are available in English, French, German, Spanish, and Japanese. WSR is also available in Mandarin. DNS is also available in Italian and Dutch.

  9. I'm comparing only to desktop software, so this question isn't applicable. I don't know how the mobile version of Dragon Anywhere works, although I know it does require the cloud.

  10. Always-listening aspects of WSR and DNS are native to the local desktop, and information is only sent back to Nuance or Microsoft based on a configuration setting. I believe, although I'm not 100% sure, that WSR's always-listening microphone doesn't have any of the privacy issues inherent in Cortana's microphone. Dragon Anywhere has no particular privacy guarantees.[5]

  11. The connectivity between screen readers and speech recognition is definitely a place where Apple could make some improvements over the Windows situation! DictationBridge is an exciting new open source tool to let DNS and WSR communicate with the Windows screen readers JAWS and NVDA. Before DictationBridge, the only way to use speech recognition hands-free on Windows was with an shockingly expensive tool which connects DNS with JAWS (which is already shockingly expensive). But DictationBridge is a new and open source tool written and maintained by a very small crew of disabled developers, who mostly have day jobs. If Apple decides to throw money at the speech recognition/screen reader connection problem, that would be amazing.


What other questions do people have?



Notes


  1. Windows has had full command-and-control speech recognition for more than 20 years through third-party software, and for more than a decade natively. [back]


  2. I don't know the answers for DD for Mac, except that I know it is not fully suitable for hands-free use. [back]

  3. Developers: if you use native operating system tools, you get access to the operating system accessibility APIs for free! [back]

  4. For example, I currently have a simple WSR tool installed to use WSR as an add-on to help control DNS. I also use multiple tools which rely on Natlink, a venerable tool that creates a Python scripting environment for DNS. But I also use DNS's advanced scripting, which relies on Visual Basic. [back]

  5. Like I said, I don't use it. Don't get me started on how many accessibility tools require that the disabled user be willing to give up their privacy as the cost of doing business, which ought to be considered a human rights issue. [back]

[personal profile] jazzyjj 2019-06-04 02:11 am (UTC)(link)
This sounds interesting indeed, and I'm curious to see it in action. I wasn't able to tune in to today's keynote, because I was with a neighbor across the hall at his place. He has an iPad and an Android phone, but is not that technical. As a proud VoiceOver user since about 2014, I would guess that Apple's new speech recognition thing will at least be somewhat accessible with VoiceOver right from the start. I personally have found Apple to have a very good track record when it comes to their screen reader. Siri might be a somewhat different story, but I guess only time will tell.
jeshyr: Blessed are the broken. Harry Potter. (Default)

[personal profile] jeshyr 2019-06-04 12:22 pm (UTC)(link)
If you go to https://www.apple.com/macos/catalina-preview/features/ and scroll down to "Accessibility" a few of those are answered