Pla docent

Academic year 2013-14

Speech Processing

Degree:	Code:	Type:
Bachelor's Degree in Computer Science	21480	Optional subject
Bachelor's Degree in Telematics Engineering	21762	Optional subject
Bachelor's Degree in Audiovisual Systems Engineering	21610	Compulsory subject, 2nd year

ECTS credits:

Workload:

100 hours

Trimester:

3rd

Department:	Dept. of Information and Communication Technologies
Coordinator:	Emilia Gómez
Teaching staff:	Emilia Gómez, Mireia Farrús, Martí Umbert
Language:	Catalan (explanations), English (material)
Timetable:
Building:	Communication campus - Poblenou

Introduction

This is an intermediate course about digital sound signal processing, conceived for the students of Audiovisual Systems Engineering.

The course focuses on the main techniques for the analysis, description, synthesis and processing of voice signals.

Prerequisites

The course is built after some foundations from previous courses, specially Acoustic Engineering and Signals and Systems (second year, Audiovisual Systems Engineering).

Associated competences

Competences to be worked during the course according to the degree description:

Cross-disciplinary competences	Specific competences
Instrumentals G1. Analysis and synthesis G2. Organization and planning G3. Application of knowledge in novel problems and situations G4. Information retrieval and management G5. Decision making Oral and writting communication Interpersonal G8. Team work Interdisciplinary contexts Systemic G11. Flexibility and creativity to apply knowledge to novel contexts G12. Ability for self-training	Specific from basic training B4 - INF / B4 - A. Complex variable functions B7 - INF / B7 - A. Fourier transforms and sampling theorem B8 - INF. Lineal and Time-Invariant systems and related functions B7 -T. Probability B9 - A. Sound propagation, acoustics, and digital signal processing Related to Audiovisual systems AU1. Telecommunication applications and systems AU3. Sound and image systems and technical requirements AU4. Audio processing techniques AU5. Analysis, synthesis, coding and recognition of speech signals AU6. Sound and music signal processing AU22. Audio and music coding principles

Cross-disciplinary competences

Specific competences

Instrumentals

G1. Analysis and synthesis

G2. Organization and planning

G3. Application of knowledge in novel problems and situations

G4. Information retrieval and management

G5. Decision making

Oral and writting communication

Interpersonal

G8. Team work

Interdisciplinary contexts

Systemic

G11. Flexibility and creativity to apply knowledge to novel contexts

G12. Ability for self-training

Specific from basic training

B4 - INF / B4 - A. Complex variable functions

B7 - INF / B7 - A. Fourier transforms and sampling theorem

B8 - INF. Lineal and Time-Invariant systems and related functions

B7 -T. Probability

B9 - A. Sound propagation, acoustics, and digital signal processing

Related to Audiovisual systems

AU1. Telecommunication applications and systems

AU3. Sound and image systems and technical requirements

AU4. Audio processing techniques

AU5. Analysis, synthesis, coding and recognition of speech signals

AU6. Sound and music signal processing

AU22. Audio and music coding principles

Assessment

The evaluation is splitted between the three main activities of the course: theoretical concepts (T), seminars (S) and labs (L) as follows:

	Description	Timing	Recoverable
Writtent tests	Final exam (70% of T): the final exam includes all the course conceptual materia, including questions related to the labs.	End of term	Yes
Written products	Test (30% of T): partial exam of concepts, including questions related to the labs.	Middle of term	No
Written products	Seminar activities (S)	Along the term	No
Practical work	Labs (L): submission of lab reports (35% of L) (individually or in couples) and a lab interview with the teacher along the term (5% of L).	Along the term	No

Minimum requirements:

• T: concept evaluation. A minimum of 5/10 is required to pass the course.

• L: A minimum of 5/10 is required to pass the course.

The final mark is obtained through the following formula:

Nota Final = 0,5*T + 0,4*L + 0,1*S

This course is intended to give the foundations to the techniques for voice analysis, recognition and synthesis, with a focus on speech signals.

• Acoustic, physiological and perceptual foundations of the human voice.

• Introduction to digital signal analysis.

• Methods for voice modeling and processing.

• Usage and implementation of voice processing algorithms.

Those concepts are structured into the following blocks:

Block 1. Introduction:

• Voice generation/perception chain.

• Sound acoustics.

• Speech processing applications.

Block 2. Foundations:

• Acoustical foundations: voice production mechanisms, speech vs singing, classificaiton of speech sounds, introduction to phonetics.

• Perceptual basis: pitch, loudness, timbre.

Block 3. Short-time analysis of voice signals.

• STFT, multiresolution.

• Features: energy, ZCR, ST-ACF, pitch.

Block 4. Perceptual models.

• Physical vs perceptual vs formant-based models.

• Speech perception.

• Voice transformations.

Block 5. Production-based models. Linear Predictive Coding (LCP).

Block 6. Text-To-Speech Synthesis.

Block 7. Automatic Speech Recognition.

• Cepstral analysis.

• Hidden Markov Models.

Methodology

For each topic, there is a theory session, a seminar and a lab. During the theory session, the main concepts are presented to the whole class. The student should then review them with the help of the material provided by the teacher.

After that, a seminar session is planned, where the student will be solving some exercises and problems related to the theoretical concepts. This activity is carried out in small grups in an interactive way.

Finally, a practical session with computers is scheduled in order to solve, in couples, practical problems requiring algorithm implementations and practical work with sound and the proposed software.

	Sessions			Hours for personal work
Lessons	Large group (2h)	Small group (1h)	Mid-size Group - Labs (2h)	Hours for personal work
1. Introduction 2. Foundations	2	1		5
3. Spectral analysis	1	1	1	8
4. Perceptual models	1		1	7
5. LPC	1	1	1	7
6. Text-To-Speech Synthesis	1	1		7 (control)
7. Cepstral analysis	1	1	1	7
8. Recognition	2	2	1	8
Summary		1		8
Final exam preparation				7
Total:	18	8	10	64	Total: 100

Theory: 18 hours (9 sessions, 2 hours each).

• Lesson 1: Introduction.

• Lesson 2: Speech production. Classification of speech sounds.

• Lesson 3: Spectral analysis.

• Lesson 4: Perceptual models.

• Lesson 5: Linear Predictive Coding (LPC) .

• Lesson 6: Text-To-Speech Synthesis.

• Lesson 7: Cepstral analysis.

• Lesson 8-9: Speech Recognition: Hidden Markov Models.

Seminars: 8 sessions, 1 hour each.

• Seminar 1: Voice acoustics.

• Seminar 2: Spectral analysis.

• Seminar 3: LPC .

• Seminar 4: Test.

• Seminar 5: Cepstrum.

• Seminar 6: Speech recognition.

• Seminar 7: Voice transformations.

• Seminar 8: Review and summary.

Labs: 5 sessions, 2 hours each.

• Lab 1: Spectral analysis.

• Lab 2: Spectral models.

• Lab 3: Analysis and synthesis.

• Lab 4: Cepstrum.

• Lab 5: Speech recognition.

Resources

Main references

• Quatieri, T. F. 2001. Discrete - Time Speech Signal Processing: Principles and Practice. Prentice Hall.

• Rabiner, L. R. and R. W. Schafer. 2007 . Introduction to Digital Speech Processing. Foundations and Trends in Signals Processing, Vol. 1, Nos. 1 - 2, 2007

Complementary references

• Rabiner, L. R. and Schafer, R. W. 1978. Digital Signal Processing of Speech Signals. Prentice Hall.

• O'Shaughnessy, D. 1999. Speech communications: human and machine. Wiley, John & Sons.

• Rabiner, L. R. and B. H. Juang. 1993. Fundamentals of Speech Recognition. Prentice Hall.

• Park, Sung-won. Linear Predictive Speech Processing.

• Park, Sung-won. Discrete Wavelet Transform.

• Spanias, Andreas. 1994. "Speech Coding: A Tutorial Review". Proceedings of the IEEE.

• Pan, Davis. 1995. "A Tutorial on MPEG/Audio Compression". IEEE Multimedia Journal.

• Rabiner, Lawrence. 1989. "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition". Proceedings of the IEEE.

Teaching material

• Slides and notes.

• Seminar activities.

• Lab instructions.

Software

• PRAAT http://www.fon.hum.uva.nl/praat/

• Octave http://www.gnu.org/software/octave/