Easier
Said Than Done
Andy Patrizio is
a freelance journalist based in Los Angeles. He is a regular contributor
to Wired News, Enterprise Systems Journal, XML & Web Services
Magazine, as well as BYTE.com.

Ever watch any
of the Star Trek shows and wish your computer could operate on voice
command like that? Speech recognition is one of those technologies
that's been under development for years, and for its advances, well,
it still stinks.
Most of the
problem is in the software, no question. A lot of very smart people
have worked very hard on this and it's still not perfect: not even
close. Plus, most of these systems require you to wear a headset,
since the microphones were so poor that they had to all but stick
the mic in your mouth to pick up anything. It reflects the low quality
of the microphones. Not to mention it's annoying to wear those things.
SoundMAX and
Andrea Electronics have teamed up to alleviate some of this mess.
SoundMAX, an Analog Devices company, makes audio chips for PC motherboard.
Its CODECs offer Dolby Digital 5.1 sound quality, providing far
better sound output than your standard AC97 audio chip. Andrea makes
microphones and software to optimize speech recognition and voice
command software.
Now, the SoundMAX
chips offer beautiful, digital sound. As most hardcore gamers know,
if your motherboard comes with on-board audio, the first thing you
do is throw in a Sound Blaster and disable the onboard audio in
the BIOS, because that AC97 audio is terrible. SoundMAX Cadenza
is every bit as rich in sound as the Sound Blaster Audigy, but what
I really wanted to test out was the speech recognition.
Together, these
two products make a very good team for speech input and command
of a computer. I was sent a PC with the SoundMAX chip on the motherboard
and an Andrea Superbeam Array
microphone to try out, after seeing it demonstrated at the previous
Intel Developer Forum.
I was impressed
with its clarity at voice command recognition, and the fact that
you didn't need to wear a headset. The unit actually has two microphones
built in, as the picture on the above link shows, and sits on top
of your monitor. Now that's a good foot or two away, unlike the
one or two inches of space between a headset microphone and your
mouth.
The microphone
is nice, but the real value is its PureAudio noise cancellation
technology, which allows the microphone to sit two feet away and
still work. Since I'm an apartment dweller, there's all manner of
noise around here, so the microphone and software got a real test.
First off, you
need to spend about 30 minutes tuning the software, so it gets used
to your voice. I used the speech recognition software in Microsoft's
Plus! for Windows XP, but you can use any package, such as Dragon
NaturallySpeaking. The microphone and chips are independent of the
voice command software.
The tuning was
done in silence. Then I tortured the system. The door and window
were opened just as the landscapers showed up, the radio was turned
on, and it was show time.
Microsoft Office
XP has some pretty comprehensive voice command/speech recognition
technology, which is where I had the most fun. Even with a leaf
blower and the radio going, the Andrea microphone operated beautifully
while running Office. Sitting a foot from the monitor, all I had
to do was say "select all," "format," "paragraph,"
"line spacing," and "double," and voila, my
Word document was double-spaced.
Despite the
sounds of man, nature and my radio, and a distance from my mouth,
the microphone picked up almost everything and I rarely had to repeat
myself. That's a major improvement over anything I've used in the
past.
I can't say
the same for Microsoft Office's speech recognition. This is why
I differentiate between voice command and speech recognition. A
friend once joked that speech recognition software is tuned to a
perfect Midwestern cadence, so unless you speak like Dan Rather,
you're probably going to have a rough time using the software. I
found this to be true in the speech recognition. My careful dictation
and attempts to suppress my northeastern accent not withstanding,
what came out was a bigger mangling of the English language than
George W. Bush at his finest.
What was said:
I wonder if I should save this and run it in my column to show how
bad the speech recognition is in Word.
What came out: To book a bouquet this is said a should run this
in my column show over the speech recognition is in word.
What was said:
Oh my God this stinks
What came out: all my daddy stake
The sound of
me laughing was recorded as and and and and and and and and and
It wasn't always
a disaster, though.
What was said:
The quick brown fox jumped over the lazy dogs
What came out: The quick brown fox jumped over the lazy dog's
What was said:
Obviously Microsoft's voice recognition software needs more work
What came out: Obviously Microsoft's voice recognition software
needs more work
Ironic, isn't
it? A slag on Microsoft was recognized perfectly.
I've been looking
for decent voice command software for a friend who is in the advanced
stages of muscular dystrophy; his arms have almost completely failed
him by now. Soon he won't be able to operate a computer. He plays
EverQuest, but there's no software that will let him operate the
game by voice, and unfortunately, the Superbeam microphone and SoundMAX
chip won't help until the EverQuest software itself is voice-enabled.
I think a lot
of companies have eschewed voice command technology because it's
been so bad. Why spend the time and money on voice commands for
your software when it will barely work because the input is so weak?
One thing is
clear: the Andrea microphone and SoundMax chip do a great job receiving
audio input and commanding applications like Office or Windows Media
Player, which is also voice command enabled. Despite the racket
that I allowed to infiltrate my apartment, I could operate Word
and WMP without ever touching the keyboard.
Perhaps improvements
in microphone and noise cancellation by Andrea and SoundMAX will
help change some minds at ISVs and make them reconsider voice command
functionality. I'm willing to bet all these people need to see is
that it can be done on the hardware side.
|