Computational Linguistics I

Computational Linguistics I (31395)

Master Program: Màster en Lingüística Teòrica i Aplicada
Year: 2014-15
Trimestre: 1
Number of ECTS credits: 5
Hores de dedicació de l'estudiant: 125
Tipus d'assignatura: Optional
Professor/s: Juan María Garrido
Language used: English

1. Course presentation

The goal of the course is twofold: to offer to the students a general panorama of the basic theoretical concepts related to the computational analysis and processing of speech, and to provide them with the basic skills in the use of tools and resources for speech analysis and speech technology.

The course is mainly oriented to students with some previous background in Phonetics and Phonology, interested in the use of computer techniques both for basic research in Phonetics and Phonology and for professional work as a computational linguist in the Speech Technology field. For this reason, the course will focus on the concepts necessary to use and develop computer speech processing tools as a computational linguist, rather than putting emphasis on describing the engineering, mathematical or programming knowledge behind these tools.

2. Objectives

It is expected that the student acquires the following competences during the course:

Basic knowledge of the theoretical concepts on Acoustic Phonetics and computational speech processing.
Basic skills in the use of Praat and other speech processing tools.
Basic knowledge of the research procedures and methodologies used in the area of computational speech analysis.
Basic knowledge of the procedures and methodologies used in the development of commercial speech technologies.
Basic abilities in the realisation of small projects related to computational speech processing.

3. Syllabus

1. Speech processing basics

1.1. Speech signals

Speech waves. Basic parameters: time, amplitude and frequency. Periodic and aperiodic signals. Simple and complex signals. Spectral analysis: the Fourier transform. Acoustic model of speech production: sources and filters. Types of sources in speech production. Filters and resonators. Basic acoustic features of speech signals: spectral envelope, formants, F0. Speech units: phones, suprasegmental phenomena.

1.2. Speech digitisation, coding and storing

Analogical-to-digital (A/D) conversion. Analogical and digital signals. Sampling. Sampling frequency. A/D converter resolution. Aliasing. Clipping. Speech coding: needs and applications. Speech coding methods. Storing speech: files and formats.

2. Speech sounds: analysis and synthesis

2.1. Acoustic analysis of segmental units

Basic representation methods: spectra, spectrograms. Identifying phones in signals: basic acoustic features. Basic analysis methods: formant analysis. Making procedures automatic: use of scripts.

2.2. Synthesis of speech sounds

Concept of speech synthesis. Types: analysis-by-synthesis, natural speech modification, speech generation. Analysis-by-synthesis techniques: LPC, sinusoidal. Speech generation: formant synthesis, articulatory synthesis. The synthesis process in TTS systems.

3. Prosodic features: analysis and synthesis

3.1. Acoustic analysis of prosody

Prosodic phenomena: intonation, stress, tone, speech rate, rhythm. Prosodic units: syllables, intonation groups. Acoustic realisation of prosodic phenomena: F0, duration, energy. Intonation analysis and modelling methods. Automatic analysis and modelling procedures: Prosogram, MoMel-IntSint, MelAn.

3.2. Synthesis of prosodic parameters

Prosody prediction: F0 and duration models. Automatic building of prosodic models. Computational methods for prosody modification and synthesis. Prosody modification techniques: LPC, Overlap-Add. Prosody in TTS.

4. Text processing for speech applications

Automatic phonetic transcription: statistical and rule-based approaches. Pronunciation dictionaries. Stress prediction. Identifying prosodic structure: syllables, intonation groups, pauses.

5. Speech corpora

Definition and features; types and applications. Transcription and annotation levels: words, phones, syllables, intonation groups. Prosodic and linguistic annotation. Some examples: ALBAYZIN, C-ORAL-ROM, Glissando. Developing speech corpora for research and speech technologies: design and collection; recording; orthographic and phonetic transcription; annotation and time-alignment. Tools for speech transcription, segmentation and annotation.

4. Assessment

The grade for the course will be calculated considering the results of:

two practical exercises done over the course (40% of the grade)
one practical work, to be delivered at the end of the trimestrer (60% of the final grade).

5. Methods and activities

Each session will be organised in two parts, the first one devoted to theoretical issues, and the second one to the realisation of practical activities related to the theoretical contents of the first part. These activities will be done by the students with the support of the teacher, as part of the learning process, but they won't be graded.

6. References

GOLD, B- MORGAN, N. (2000).- Speech and Audio Signal Processing, Processing and Perception of Speech and Music, Wiley. UPF

HARRINGTON, J. - CASSIDY, S. (1999).- Techniques in Speech Acoustics, Dordrecht, Kluwer Academic Publishers. UPF

O´SHAUGHNESSY, D. (1987).- Speech Communication. Human and Machine. Addison Wesley Series in Electrical Engineering, 2na edició, 2000. UPF

RABINER, L. R. - SCHAFER, R. W. (2007).- Introduction to Digital Speech Processing, Foundations and Trends® in Signal Processing, Vol. 1, Nos. 1-2 (2007), 1-194.

SCHROEDER, M. R. (1999).- Computer Speech. Recognition, Compression, Synthesis, Springer-Verlag. UPF