Speech Separation for Underdetermined Scenario
Signal Processing, Inc. (SPI) developed a near real-time speech separation system using multiple microphones. Underdetermined means the number of speakers is more than the number of microphones. Our technology was developed under the support of Navy. The idea is to take advantage of sparsity in speech to separate speech streams from multiple speakers. The system is applicable to microphone arrays (>2 mics).
Key features include:
- Near real-time processing. If fast processor is available, real-time can be achieved.
- The number of speakers can be more than the number of mics.
- Can work in reverberant environment.
In theory, more mics will have more degrees of freedom and hence better separation of voices. In our experiments, we have used actual data recorded by using a dual mic recorder. Three people are talking simultaneously in a small conference room and we are still able to separate their voices very accurately. Sound files of originally mics and separated voices are show below.
Audio Demos: Original Mic1, Original Mic2, Separated Voice1, Separated Voice2, Separated Voice3,