Speech and Audio Compression-The Twain Shall Meet
Chandra Sekhar Seelamantula, BIG
Chandra Sekhar Seelamantula, BIG
Seminar • 18 July 2007 • BM 4.235
AbstractSpeech and music signals are nonstationary in nature and their spectral properties, statistics, perceptual attributes such as pitch etc. change with time. A parsimonious representation of these signals is obtained by using nonstationary signal models instead of stationary ones. We consider the envelope-frequency model and propose a new zerocrossing technique for computing these parameters for a given signal in a multiband framework. These parameters are amenable to efficient quantization using psychoacoustic criteria. Preliminary experimental results show that the compression performance thus achieved is comparable to that obtained with standard coding techniques such as the MP3 and MPEG-2 AAC. We also discuss issues related to blocking artifacts and dynamic auditory perception. The talk will be followed by an audio demonstration. This talk is based on a part of my Ph.D thesis submitted to the Indian Institute of Science. This work was done in collaboration with Prof. T.V. Sreenivas, IISc.