Alango’s Voice Activity Detection (VAD) technology reliably detects human speech in an acoustic signal. The technology is based on a proprietary, high-resolution spectral noise estimation algorithm operating in real time. VAD’s sensitivity level is adjustable and ensures minimization of false positive detections in a presence of background noise and other non-speech sounds.
VAD consumes less than 2 MIPS of processor load, which means that a device with VAD implemented consumes very little power in standby mode. VAD runs on the first stage of the signal processing path and ensures that heavier signal processing tasks (e.g., acoustic echo cancellation, beamforming, noise suppression, and speech recognition) can remain “asleep” until voice is detected. When VAD detects voice activity, the system “wakes up” and full signal processing begins.
In portable, voice enabled devices, VAD enables always-on voice detection at low standby power by allowing MIPS-heavy signal processing to “sleep” until voice is detected.
In smart speaker applications, VAD enables the creation of battery-powered smart speakers. On average, common always-on smart speakers consume roughly 3 watts during standby. Implementing VAD in a smart speaker can allow for significant power savings during standby, allowing for the introduction of battery-powered always-on devices.
In security system applications, VAD can be used to trigger the recording or transmission of audio/video feeds when voice is detected in a monitored area.
In headset and hearable applications, VAD enables always-on voice detection at low standby power by allowing MIPS-heavy signal processing to “sleep” until voice is detected.
Additionally, in hearable applications (such as smart earbuds), when used with an in-ear bone-conduction (motion) sensor, VAD can specifically detect user’s voice activity, distinguishing it from other signals recorded by the motion sensor such as user's jaw movements (e.g. chewing), user walking, running or tapping the device.
In smart appliance applications, VAD can be used to enable always-on, always ready for voice commands standby mode with low power consumption, leading to reduced user energy costs over time.
In automotive applications, VAD can be used to wake up a car when voice is detected, allowing the system to proceed to biometric signal processing. Used this way, VAD allows for always-on operation while ensuring that power-heavy processing is only done when voice is detected.
Although it can be used standalone, VAD can also be integrated with Alango Voice Enhancement Package (VEP) or Voice Communication Package (VCP) for a complete preprocessing solution.
Voice Activity Detector (VAD) Block Diagram
Performance
- Small footprint and low-MIPS implementation (ex. < 2 MCPS on ARM M4)
- All sampling rates are supported (ex. 32/24/16/8/4/2 kHz)
- Can be applied to acoustic microphone and low-bandwidth vibration sensor signals
- Flexibly configurable settings including sensitivity to harmonicity and SNR
- Low-latency reaction (typically < 50ms)
Availability
- ARM cores (all types)
- CEVA TeakLite-III, TeakLite-4
- Synopsys ARC cores
- Cadence (Tensilica) HiFi 2, HiFi 3
- Qualcomm 512X
- Porting on other platforms can be performed quickly.