Learn how to program
by playing video games.

Coding by Voice with Voice Attack: a Practical Guide for Programmers

Coding by Voice with Voice Attack: a Practical Guide for Programmers

January 13, 2020

Programming with voice recognition is something I've wanted to try for a while. In this video, I let you know what I think and how to set it up for yourself using Voice Attack.

I first learned about Voice Attack while playing Elite Dangerous.

Links
My Python Voice Attack profile: https://github.com/learncodebygaming/voiceattack
Get Voice Attack: https://voiceattack.com/
Tavis Rudd's presentation: https://www.youtube.com/watch?v=8SkdfdXWYaI
Voice Attack manual: https://www.voiceattack.com/VoiceAttackHelp.pdf

VoiceAttack is uses the Windows Speech Engine that's built into Windows to perform its voice recognition. VoiceAttack provides the glue between the text to speech engine and executing your macros in response to what that engine hears.

In the options, which you get to from the wrench in lower right, and under "Recognition", I checked "show confidence level" so I can get a feel for how well the voice recognition is understanding me. You'll want to monitor the log over here when you're first using this software to get a feel for how it's working.

You can create different profiles to use with different applications. You can download my Python VoiceAttack profile if you want to start with that, or you can create a new one. Let me show you how I setup these commands so you can build something custom for yourself.

Let's start with the "slap" command, which in my profile just presses "Enter" on the keyboard. To do that you'd just edit the profile, and create a new command. Set "When I say" to "slap", and "when this command executes" select "Key Press". In the window that pops up, you can just press the key you want on your keyboard, so in this case "Enter", click "OK", and then just save it by clicking "OK" again. 

In the "Edit a Profile" window click "Apply", and now with VSCode in focus, when I say "slap" a carriage return will happen. If you're not getting this to work, one thing to check is if your IDE is running in administrator mode like mine, then VoiceAttack must also be run as an administrator.

One thing to watch out for, is if I say "slap slap" (quickly twice like that), VoiceAttack will consider that its own command. So it won't do an "Enter" at all in that case, it will just show up as an unrecognized command. So to handle that, since I say this often, I just setup a new command for "slap slap" that hits "Enter" twice.

That was an easy one, so let's look at something more interesting. I use "new function" to quickly outline a new function for me. To do that, I'm using "Quick Input", which you can find under "Other", "VoiceAttack Action". And with this you can just give it a string and it will type out that string for you. You'll want to change the "Hold keys down for..." setting, I've got it at 0.030 seconds, because the default of 0.150 seconds will type pretty slowly.

In this "new function" command I also have a few more steps. I want the function name to get highlighted for me, so I can immediately type in the name for what I want to call my function. To do this I have an action to press the up key, then another one to press Ctrl + D, which in VSCode selects the entire word.

The "Quick Input" feature is nice, but it's still a little slow. Something you can do instead is copy the text you want to the Windows clipboard, and then just paste that. If we look at my setup for "new main", this generates all the boilerplate code I want for a main script. Under "Other", "Windows", you'll find "Set a Text value to the Windows clipboard". I've simply entered my code in there, and then added one more action to paste it with Ctrl + V.

In my "start program" command, I'm just having it press Ctrl + Alt + N, which is the shortcut I have in VSCode for running a script. So you can combine the keyboard shortcut features of whatever editor you're using with VoiceAttack to do some pretty powerful things without much effort.

Now when picking your command words, there are a few things to keep in mind. Be careful using one-syllable words, or words that sound alike, because they can get misheard pretty easily. Also, think about what sounds you have trouble saying clearly, like I have trouble with 'R's, so I try to avoid phrases that start with 'R'. Like I use "start program" instead of "run program".

When I was creating my profile, I initially used "Up" for the up arrow, but "Up" got confused with "snap" all the time. So then I tried using "Go up", but that was hard for it to understand, too, so I settled on "upper" and "downer" for up and down.

Things begin to get complicated when you start using dictation. Dictation is where you want the computer to listen to you and type exactly what you're saying.

I've made a command called "say it" that will start dictation. You then need to pause for a moment before speaking the words you want, and then when you're done I've made a "done" command that will stop dictation, paste what was said, and then clear the dictation buffer. If you look at the Windows clipboard action for this command, you'll see you can get what is in the dictation buffer by using the token "{DICTATION}".

There are some options you can use to format your dictation too, like "{DICTATION:LOWERCASE}", or "{DICTATION:PERIOD:CAPITAL:NEWLINE}". You can find out more about those in the documentation.

Now this doesn't type anything out while you speak. It simply listens, and when you're done the whole thing gets pasted. If that bothers you, you'll want to incorporate a loop that outputs chunks of what you're saying while you talk. Here's a forum link that discusses how to do that: https://forum.voiceattack.com/smf/index.php?topic=1825.0

If you try out my "say it" command, the first thing you'll notice is: it doesn't work great. But there are things you can do to improve it. Go to the VoiceAttack options (the wrench in the bottom right corner), go to the "Recognition" tab, and then click "Utilities". In here, the first thing I did was go to "Microphone Setup" to make sure my microphone wasn't too "hot", because that will cause problems. After you've done that, come back and go to "Speech Engine Training". By doing this training, your computer will come to understand you better, so your commands will be picked up more consistently, and hopefully your dictation is more accurate. I did this training twice.

Let's go back to the profile and look at my "go to" command. What this one does is, it presses the keyboard shortcut to open the "Go To" window in VSCode, then it starts dictation and I give myself 2 seconds to say the line number I want to go to, then it types that in and presses enter. It works pretty good if you say a number like 15, but if you say a number that's lower than 10, it types it out in English, so the "Go To" doesn't work. I have a few ideas on how to fix this, but I haven't gotten around to it yet.

I've got one more cool thing I want to show you.

So when you use dictation, it obviously prints out what you say in normal English, with all the spaces and everything you'd expect. But it'd be great to be able to create variable or function names with this, too. And unfortunately the available dictation formatting options don't have the ability to do this. So I found a different way.

Let's look at my "call it" command. It allows me to dictate a name for 3 seconds, and eventually it puts the value in the clipboard and pastes it, like we've done before. But in the middle I've put an "Inline C# Function". You can find this under "Other", "Advanced", "Execute an Inline Function", and "C# code".

This is obviously super powerful, because what it allows us to do is to run any code we can write, with full access to all the variables and everything generated by the previous actions in the command sequence.

So I've written a small C# script that just takes the dictation value, makes it lowercase, and replaces the spaces with underscores. I then save that new value to a variable, and it's that variable that I'm copying to the clipboard in the next action step. It even allows us to write to the log so we can debug our script.

So the possibilities are endless. We've even got the ability to run applications, under "Other", "Windows", "Run an Application". So anything you want to do, your imagination is the limit here.

Some final thoughts. You don't want to make commands for anything too dangerous when you're first getting started with this. Things like git commands, or file deletion, could get you in trouble if VoiceAttack misinterprets something it hears. 

You also don't want to start with too many commands, because you won't be able to remember them all. Maybe introduce 20 new commands at a time. Keeping your command list small will also reduce the chances for interference during your dictations.


How To Send Inputs to Multiple Windows and Minimized Windows with Python
Let's explore using SendMessage to send automated inputs to multiple windows at once, or to windows that are minimized or in the background. I'll share …
AP Computer Science A - Study Session
Get prepared for the AP Computer Science A exam! I'll let you know what to expect and go through some example questions from the official …
How to make a Video Game in Java (2D Basics)
This project will get you making your first game in Java! Take my starter code (I explain how it works) and build your own game! …
Ben Johnson My name is Ben and I help people learn how to code by gaming. I believe in the power of project-based learning to foster a deep understanding and joy in the craft of software development. On this site I share programming tutorials, coding-game reviews, and project ideas for you to explore.