Lingüística computacional 1

Computational Linguistics I (31395)

Juan María Garrido Almiñana

 

Goals

The goal of the course is twofold: to offer to the students a general panorama of the basic theoretical concepts related to the computational analysis and processing of speech, and to provide them with the basic skills in the use of tools and resources for speech analysis and speech technology.

The course is mainly oriented to students with some previous background in Linguistics interested in the use of computer techniques both for basic research in Phonetics and Phonology and for professional work as a computational linguist in the Speech Technology field. For this reason, the course will focus on the concepts necessary to use and develop computer speech processing tools as a computational linguist, rather than putting emphasis on describing the engineering, mathematical or programming knowledge behind these tools.

 

Competences

1.   Basic knowledge of the theoretical concepts on Acoustic Phonetics and computational speech processing.

2.   Basic skills in the use of Praat and other speech processing tools.

3.   Basic knowledge of the research procedures and methodologies used in the area of computational speech analysis.

4.   Basic knowledge of the procedures and methodologies used in the development of commercial speech technologies.

5.   Basic abilities in the realisation of small projects related to computational speech processing.

 

Contents

1. Speech signals

 

Speech waves. Basic parameters: time, amplitude and frequency. Periodic and aperiodic signals. Simple and complex signals. Spectral analysis: the Fourier transform. Acoustic model of speech production: sources and filters. Types of sources in speech production. Filters and resonators. Acoustic features of speech signals: spectral envelope, formants, F0.

 

2. Speech digitisation, coding and storing

 

Analogical-to-digital (A/D) conversion. Analogical and digital signals. Sampling. Sampling frequency. A/D converter resolution. Aliasing. Clipping. Speech coding: needs and applications. Speech coding methods. Storing speech: files and formats.

 

3. Speech analysis and modelling

 

Computational methods for the experimental analysis of speech. Speech analysis tools: Praat. Basic representation methods: spectra, spectrograms, F0 contours. Identifying acoustic features in speech. Making procedures automatic: scripts. Using large corpora in speech research. Modelling in speech research.  Automatic modelling: MoMel-IntSint, MelAn.

 

4. Speech synthesis

 

3.1. Synthesis methods and techniques

Concept of speech synthesis. Types: analysis-by-synthesis, natural speech modification, speech generation. Analysis-by-synthesis techniques: LPC, sinusoidal. Speech modification: Overlap-Add techniques. Speech generation: formant synthesis, articulatory synthesis.

 

3.2. Text-to-speech systems

Definition. Typical structure of a text-to-speech (TTS) system. Linguistic processing in TTS: pre-processing, letter-to-sound, morpho-syntactic analysis, prosodic analysis. Speech wave generation: unit selection, F0 and duration prediction, speech signal modification. Developing TTS systems.

 

5. Speech recognition

 

5.1. Recognition methods and techniques

Concept of speech recognition. Steps in the recognition process: parametrisation, acoustic recognition. Parameters used in speech recognition. Recognition techniques: HMM, neural nets, linguistic rules.

 

5.2. Speech recognition systems

Definition and typical steps: parametrisation, acoustic recognition, linguistic post-processing. Developing speech recognition systems. Other types of recognition systems: speaker recognition, language recognition, emotion recognition.

 

6.   Speech corpora

 

6.1 Definition and features

Types and applications. Transcription and annotation levels: words, phones, syllables, intonation groups. Prosodic and linguistic annotation. Some examples: ALBAYZIN, C-ORAL-ROM, Glissando.

 

6.2. Developing speech corpora

Design and collection. Recording. Orthographic and phonetic transcription. Annotation and time-alignment. Tools for speech transcription. Segmentation and annotation tools.

 

Course organisation

Each session will be organised in two parts, the first one devoted to theoretical issues, and the second one to the realisation of practical activities related to the theoretical contents of the first part. These activities will be done by the students with the support of the teacher, as part of the learning process.

 

 Evaluation

The grade for the course will be calculated considering the results of:

·         two practical exercises done over the course (40% of the grade)

·         one practical work, proposed by the student considering his/her interests, or chosen from a list provided by the teacher (60% of the final grade).

 

General readings

GOLD, B- MORGAN, N. (2000).- Speech and Audio Signal Processing, Processing and Perception of Speech and Music, Wiley. UPF

 

HARRINGTON, J. - CASSIDY, S. (1999).- Techniques in Speech Acoustics, Dordrecht, Kluwer Academic Publishers. UPF

 

O´SHAUGHNESSY, D. (1987).- Speech Communication. Human and Machine. Addison Wesley Series in Electrical Engineering, 2na edició, 2000.

 

RABINER, L. R. - SCHAFER, R. W. (2007).- Introduction to Digital Speech Processing, Foundations and Trends® in Signal Processing, Vol. 1, Nos. 1-2 (2007), 1-194.

 

SCHROEDER, M. R. (1999).- Computer Speech. Recognition, Compression, Synthesis, Springer-Verlag. UPF