Researchers at Carnegie Mellon University in Pittsburgh, PA have developed a camera system that can ‘see sound vibrations with such precision and detail that it can reconstruct the music of a single instrument in a band or orchestra.’
The novel system, developed in the School of Computer Science’s Robotics Institute (RI), uses a pair of cameras and a laser to ‘sense high-speed, low-amplitude surface vibrations.’ The vibrations are then used to reconstruct sound and capture isolated audio without interference or a microphone. Even highly-directional mics struggle to eliminate nearby sound and deal with ambient noise. Further, traditional mics can’t eliminate the effects of acoustics during audio capture.
‘We’ve invented a new way to see sound,’ said Mark Sheinin, a post-doctoral research associate at the Illumination and Imaging Laboratory (ILIM) in the Robotics Institute. ‘It’s a new type of camera system, a new imaging device, that is able to see something invisible to the naked eye.’
The research team has successfully demoed their new system. The team ‘captured isolated audio of separate guitars playing at the same time and individual speakers playing different music simultaneously.’
CMU’s camera system isn’t the first of its kind. Some of first visual microphones were developed in 2014 by MIT researchers. CMU’s system improves upon earlier work in numerous ways, including practicality and cost. ‘We’ve made the optical microphone much more practical and usable,’ said Srinivasa Narasimhan, a professor in the RI and head of the ILIM. ‘We’ve made the quality better while bringing the cost down.’ CMU’s approach uses ordinary cameras, which are much less expensive than the high-speed cameras used in prior research.
The system analyzes the differences in ‘speckle patterns’ from images captured with a rolling shutter and a global shutter. An algorithm then works to compute the difference in the speckle patterns from the two different video streams. These differences are then converted into vibrations to reconstruct the original sound. CMU writes, ‘A speckle pattern refers to the way coherent light behaves in space after it is reflected off a rough surface. The team creates the speckle pattern by aiming a laser at the surface of the object producing the vibrations, like the body of a guitar. That speckle pattern changes as the surface vibrates. A rolling shutter captures an image by rapidly scanning it, usually from top to bottom, producing the image by stacking one row of pixels on top of another. A global shutter captures an image in a single instance all at once.’
‘Mark Sheinin (left) and Dorian Chan (right) were part of a CMU research team that developed a camera system that can see sound vibrations with such precision that it can capture isolated audio of separate guitars playing at the same time.’ Credit: Carnegie Mellon University
The research paper, ‘Dual-Shutter Optical Vibration Sensing,’ received a ‘Best Paper’ honorable mention at the recent 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) in New Orleans, LA. In case you missed it, NVIDIA also presented research on an AI tool that converts a series of 2D images into 3D models at the CVPR conference.
Practical uses for the optical vibration-sensing camera include allowing sound engineers to monitor individual instruments without hearing the other instruments when mixing, monitoring vibrations of industrial equipment to check for issues, surveying machinery’s mechanical health, and more. To learn more about the research, visit the CMU Imaging website.