Academic year 2013-14

Speech Processing

Degree: Code: Type:
Bachelor's Degree in Computer Science 21480 Optional subject
Bachelor's Degree in Telematics Engineering 21762 Optional subject
Bachelor's Degree in Audiovisual Systems Engineering 21610 Compulsory subject, 2nd year

 

ECTS credits: 4 Workload: 100 hours Trimester: 3rd

 

Department: Dept. of Information and Communication Technologies
Coordinator: Emilia Gómez
Teaching staff:

Emilia Gómez, Mireia Farrús, Martí Umbert

Language:

Catalan (explanations), English (material)

Timetable:
Building: Communication campus - Poblenou

 

Introduction

This is an intermediate course about digital sound signal processing, conceived for the students of Audiovisual Systems Engineering.

The course focuses on the main techniques for the analysis, description, synthesis and processing of voice signals.

 

Prerequisites

The course is built after some foundations from previous courses, specially Acoustic Engineering and Signals and Systems (second year, Audiovisual Systems Engineering).

 

Associated competences

Competences to be worked during the course according to the degree description:

Cross-disciplinary competencesSpecific competences

Instrumentals

G1. Analysis and synthesis

G2. Organization and planning

G3. Application of knowledge in novel problems and situations

G4. Information retrieval and management

G5. Decision making

Oral and writting communication

Interpersonal

G8. Team work

Interdisciplinary contexts

Systemic

G11. Flexibility and creativity to apply knowledge to novel contexts

G12. Ability for self-training

Specific from basic training

B4 - INF / B4 - A. Complex variable functions

B7 - INF / B7 - A. Fourier transforms and sampling theorem

B8 - INF. Lineal and Time-Invariant systems and related functions

B7 -T. Probability

B9 - A. Sound propagation, acoustics, and digital signal processing

Related to Audiovisual systems

AU1. Telecommunication applications and systems

AU3. Sound and image systems and technical requirements

AU4. Audio processing techniques

AU5. Analysis, synthesis, coding and recognition of speech signals

AU6. Sound and music signal processing

AU22. Audio and music coding principles

 

Assessment

The evaluation is splitted between the three main activities of the course: theoretical concepts (T), seminars (S) and labs (L) as follows:

 

 DescriptionTimingRecoverable
Writtent tests

Final exam (70% of T): the final exam includes all the course conceptual materia, including questions related to the labs.

End of term

Yes

Written products

Test (30% of T): partial exam of concepts, including questions related to the labs.

Middle of term

 No

Seminar activities (S)

Along the term

No

Practical work

Labs (L): submission of lab reports (35% of L) (individually or in couples) and a lab interview with the teacher along the term  (5% of L).

Along the term

 No

Minimum requirements:

• T: concept evaluation. A minimum of 5/10 is required to pass the course.

• L: A minimum of 5/10 is required to pass the course.

The final mark is obtained through the following formula:

Nota Final = 0,5*T + 0,4*L + 0,1*S

 

Contents

This course is intended to give the foundations to the techniques for voice analysis, recognition and synthesis, with a focus on speech signals.

• Acoustic, physiological and perceptual foundations of the human voice.

• Introduction to digital signal analysis.

• Methods for voice modeling and processing.

• Usage and implementation of voice processing algorithms.

Those concepts are structured into the following blocks:

Block 1. Introduction:

• Voice generation/perception chain.

• Sound acoustics.

• Speech processing applications.

Block 2. Foundations:

• Acoustical foundations: voice production mechanisms, speech vs singing, classificaiton of speech sounds, introduction to phonetics.

• Perceptual basis: pitch, loudness, timbre.

Block 3. Short-time analysis of voice signals.

• STFT, multiresolution.

• Features: energy, ZCR, ST-ACF, pitch.

Block 4. Perceptual models.

• Physical vs perceptual vs formant-based models.

• Speech perception.

• Voice transformations.

Block 5. Production-based models. Linear Predictive Coding (LCP).

Block 6. Text-To-Speech Synthesis.

Block 7. Automatic Speech Recognition.

• Cepstral analysis.

• Hidden Markov Models.

 

Methodology

For each topic, there is a theory session, a seminar and a lab. During the theory session, the main concepts are presented to the whole class. The student should then review them with the help of the material provided by the teacher.

After that, a seminar session is planned, where the student will be solving some exercises and problems related to the theoretical concepts. This activity is carried out in small grups in an interactive way.

Finally, a practical session with computers is scheduled in order to solve, in couples, practical problems requiring algorithm implementations and practical work with sound and the proposed software.

 

 SessionsHours for personal work
LessonsLarge group (2h)Small group (1h)Mid-size Group - Labs (2h)

1. Introduction

2. Foundations

2

1

 

5

3. Spectral analysis 1 1 1 8
4. Perceptual models 1   1 7
5. LPC 1 1 1 7
6. Text-To-Speech Synthesis 1 1   7 (control)
7. Cepstral analysis 1 1 1 7
8. Recognition 2 2 1 8
Summary   1   8

Final exam preparation

 

 

 

7

Total:

18

8

10

64

Total: 100

Theory: 18 hours (9 sessions, 2 hours each).

• Lesson 1: Introduction.

• Lesson 2: Speech production. Classification of speech sounds.

• Lesson 3: Spectral analysis.

• Lesson 4: Perceptual models.

• Lesson 5: Linear Predictive Coding (LPC) .

• Lesson 6: Text-To-Speech Synthesis.

• Lesson 7: Cepstral analysis.

• Lesson 8-9: Speech Recognition: Hidden Markov Models.

 

Seminars: 8 sessions, 1 hour each.

• Seminar 1: Voice acoustics.

• Seminar 2: Spectral analysis.

• Seminar 3: LPC .

• Seminar 4: Test.

• Seminar 5: Cepstrum.

• Seminar 6: Speech recognition.

• Seminar 7: Voice transformations.

• Seminar 8: Review and summary.

 

Labs: 5 sessions, 2 hours each.

• Lab 1: Spectral analysis.

• Lab 2: Spectral models.

• Lab 3: Analysis and synthesis.

• Lab 4: Cepstrum.

• Lab 5: Speech recognition.

 

Resources

Main references

• Quatieri, T. F. 2001. Discrete - Time Speech Signal Processing: Principles and Practice. Prentice Hall.

• Rabiner, L. R. and R. W. Schafer. 2007 . Introduction to Digital Speech Processing. Foundations and Trends in Signals Processing, Vol. 1, Nos. 1 - 2, 2007

Complementary references

• Rabiner, L. R. and Schafer, R. W. 1978. Digital Signal Processing of Speech Signals. Prentice Hall.

• O'Shaughnessy, D. 1999. Speech communications: human and machine. Wiley, John & Sons.

• Rabiner, L. R. and B. H. Juang. 1993. Fundamentals of Speech Recognition. Prentice Hall.

• Park, Sung-won. Linear Predictive Speech Processing.

• Park, Sung-won. Discrete Wavelet Transform.

• Spanias, Andreas. 1994. "Speech Coding: A Tutorial Review". Proceedings of the IEEE.

• Pan, Davis. 1995. "A Tutorial on MPEG/Audio Compression". IEEE Multimedia Journal.

• Rabiner, Lawrence. 1989. "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition". Proceedings of the IEEE.

 

Teaching material

• Slides and notes.

• Seminar activities.

• Lab instructions.

 

Software

• PRAAT http://www.fon.hum.uva.nl/praat/

• Octave http://www.gnu.org/software/octave/

• MATLAB