Tuned out cars
Input audio in a voice-enabled automobile is a behind-the-scenes phenomenon. The end user doesn’t usually notice the audio chain unless things go awry. It is similar to working as a stagehand on Broadway: a difficult and even thankless job, full of unexpected obstacles, that isn’t noticed until the curtain drops in the middle of Kristin Chenoweth’s big solo.
Audio has a long, sometimes difficult route in the connected car — traveling from your mouth all the way to the speech recognizer “hearing” what you said. In the short version of this journey, there are two halves:
The first half of the journey takes you through the interior of the vehicle – from your mouth to the car’s microphone. Unfortunately, cars can be very noisy environments. If you could hear the bumping, grunting, and shuffling of the stagehands, would you really enjoy the play? Think of everything that you can hear in the car: engine revs, potholes, tractor trailers passing on the right, kids playing in the backseat, windshield wipers, climate control noise… and then finally your voice.
Take potholes: A common condition on the Michigan highways that I frequent… You engage the VR system and say “Call Al”. You merge to an off-ramp at just the right time — and the VR system might actually hear “Call ” instead. Competing voices is another common pitfall in the voice-enabled car. While driving with your kids, you attempt to change the radio station by voice. The VR system now has to interpret what is meant by “DADD…” – “Tune to DAD 100.3 FM” – “…DYYY”. Noise and interferences like these can cause significant misrecognitions and other unwanted behaviors from the VR system.
The second half of audio’s journey can be equally difficult. Having a correctly-structured audio configuration within an infotainment system is critical to a successful user experience. During a voice recognition dialog, the system must know when to start and stop listening for the user (the “listening window”). Like stagehands opening and closing the curtain during scene changes, this has a significant impact on the user experience. If the curtain opens early, the audience sees what they shouldn’t. If it closes too quickly, the audience will miss key elements of the plot. In the case of the car’s VR, this is equivalent to the system hearing “ Dial 911” or “Dial 1-800-5//cutoff” respectively. Both situations could cause the user to get an unexpected result.
Other areas of audio configuration also present potential difficulties for the end user experience. A common reaction to a non-working VR system is to speak more loudly with each failure (as humans, we sometimes do this in conversation to make sure we are clearly heard). But what if the audio level in the voice recognizer is already configured at too high a volume? Yelling will only make the problem worse, frustrating the user with multiple failed recognitions. This is why proper configuration and tuning is so important – a scene you can see play out in a demo of Dragon Drive.
The voice-enabled car of the future will selectively ignore driving and passenger noises, allowing a seamless and error-free experience for the operator. Luckily for us, the future is approaching quickly. Today, there are exciting new technologies aimed at addressing some of these common audio challenges. New developments in Digital Signal Processing allow both stationary (like road and fan noise) and non-stationary (like road bumps) noises to be well-suppressed. Other new technologies allow the system to ignore interfering speakers (one variant is called “off-axis suppression”). With this enabled, passengers are able to hold side conversations while you speak voice recognition commands without worry.