Learning-Based Attenuation of the Noise of a Speech Recording
Linear predictive coding (LPC) is a time-honored reversible decomposition of speech in two components. One component encodes formants, which describe a combination of the configuration of the speaker's vocal tractand perhaps the acoustics of the room. This component is also related to what is perceptively relevant to a human ear, particularly in terms of the categorization of vowels. The other component, called residue, encodes pitch and the transients of the speech signal, among them consonants.
In this project, the student will first apply LPC to the clean section of the speech recording of a single speaker. He will then establish two dictionaries, one for formants and one for residues. This particular recording happens to also contain a dirty section that we want to cleanup. To do so, the raw formants and residues of the dirty section will be replaced by their nearest entry within the dictionaries learned from the clean section. In doing so, we hope to improve the perceptual quality where needed.
The prerequisites for this project are a good mastering of signal processing and linear algebra. The work will consist of theoretical and algorithmic developments, as well as their application to speech data.
- Supervisors
- Denis Fortun, denis.fortun@epfl.ch, 35136, BM 4.138
- Michael Unser, michael.unser@epfl.ch, 021 693 51 75, BM 4.136