deborah: the Library of Congress cataloging numbers for children's literature, technology, and library science (Default)
2019-06-03 06:01 pm

speech recognition at last? I have so many questions.

At WWDC (the annual Apple developers' conference), Apple announced something which might be full command-and-control speech recognition for the Mac at last, for the first time.[1] None of the regular tech journalists are asking the questions I desperately want to know, however.

Most of my questions boil down to this:

How much did the Apple developers and designers of this product work with users of Dragon NaturallySpeaking for Windows (DNS), DragonDictate for Mac (DD), and Windows Speech Recognition (WSR)?

How much did they learn about what the speech recognition community already expects as a minimal baseline, as well as what speech recognition users have been lacking in our current tools?


Because how Apple answers that first question will inform the answers to all these details:


  1. Will this allow complete hands-free command and control? In other words, will users be able to control their computer without a mouse, a keyboard, a virtual keyboard, a switch, or mouse emulation?

  2. Will it give access to the menus, graphical icons, or any other aspects of the standard OS X desktop chrome, as long as the code is written using Apple standards?

  3. How will it work with tools that are not natively enabled to use it? For example, if I install an application that runs in a virtual machine (eg. Eclipse or Slack), what aspects of this speech recognition will be available and what won't?

  4. Will it require the cloud or network access to work?

  5. Will it have a trainable voice model?

  6. Will it have a configurable vocabulary?

  7. Will it be programmable, either with simple macros or with complex third-party tools?

  8. In what languages will it be available?

  9. Will the mobile version require a physical trigger to access, as with the built in microphone-icon-to-dictate currently available on iOS? Can it be left on all the time?

  10. How will the privacy be guaranteed for any always-listening aspects?

  11. Does it integrate with Apple VoiceOver?


For context, the answers to these questions for DNS and WSR )

What other questions do people have?

Endnotes )
deborah: the Library of Congress cataloging numbers for children's literature, technology, and library science (Default)
2014-05-23 10:33 pm

iOS 8 Accessibility wishes

ATMac has been collecting iOS 8 accessibility wishes. Here are mine, focused on mobility limitations:

  • A single tap for a page down / move down one full screen, and page up / move up one full screen, which works in all apps that follow a base iOS standard
  • Hey, how about a base iOS standard widget / gesture set, which apps are encouraged to follow for consistent usability, just like on a desktop machine!
  • Tap to go to the bottom of the screen, like we currently have tap to go to the top
  • Improvements to Assistive Touch favorites, starting with better documentation; ability to reorganize favorites; more favorites; more consistent favorite functionality; and easier movement between Assistive Touch favorites and the rest of the UI. I would love a favorites bar dock I could place on the screen.
  • A more functional way to use Assistive Touch with VoiceOver, maybe with a bank of predefined taps for the base VoiceOver gestures
  • Integration with alternative keyboards -- if I could only use Flesky as my default keyboard!
  • More comprehensive Siri or VR integration, especially more command-and-control
  • At least some minimal VR in the absence of network, just a little command-and-control


Also, I keep hearing about the great accessibility features that would be available to me if I were to jailbreak my phone. I'm not going to, but why should I need to jailbreak the phone to be able to run f.lux to dim the screen with the time of day, or the ability to move mutiple icons at a time, or tie gestures to app activation, or half the stuff I mentioned above regarding a richer family of gestures. If jailbreakers can do it, Apple can do it, and I hardly see how having a "jump to bottom of screen" would be a security or stability hole.
deborah: the Library of Congress cataloging numbers for children's literature, technology, and library science (Default)
2011-03-25 03:32 pm

mobile devices and accessibility

An open letter to those implementing mobile device accessibility:

I know that hands-free mobile device control is difficult, and I am grateful for the amount of voice control which has been implemented so far. The ability to dial a number, send a text, send an e-mail, or leave a memo are all useful. Now here's what I would like to see next:

  • A microphone which stays on until turned off, rather than tap-to-speak. I understand this could have implications for users who don't know how to use it, but then, the same goes for having a telephone in the first place.

  • A 36-item vocabulary, probably native to the phone, of the letters in the alpha-bravo alphabet and the digits 0-9.

  • The ability to start an app installed on the phone by saying "start [app name]". E.g. "start Angry Birds". (No, I have no idea how to control Angry Birds by voice. I just don't know the name of a lot of mobile applications, as I don't have one, because I still can't use one. Hence this post.)

  • A seven-item vocabulary, probably native to the phone, that can be used in webpages: page up; page down; back; forward; show numbers; go to address; press enter. "Show numbers" would put a number next to every clickable or selectable element (much like the Firefox extension mouseless browsing), allowing those items to be selected by dictating from the digit vocabulary.

  • The command "microphone off".

  • The command "dictate here", allowing the user to open up a remote-processed standard dictation window in any field or application.


Now, I will admit that I have never done any mobile programming, and I have no idea what the limitations are for vocabulary recognition. Am I mistaken in my belief that adding another 46 items to the local-to-the-device vocabulary (on top of the ones that already exist such as "send a memo to") is something a contemporary mobile device should be able to handle?

As a bonus, I see in the Android accessibility best practices that all applications should be designed to pay attention to the directional controller as well as just the touchscreen. Great, that opens up the possibility for four more voice commands: up, down, left, and right. That brings us up to 50 desirable items in the native vocabulary.

Can your phone handle that? And if not, can the next generation of your phone handle that? And if not, why not?



(Geeze, I'm starting to feel like I should add HV1569.5 to my default icon.)