Lingüística computacional 2

Computational Linguistics 2 (31396)

Master Program: Màster en Lingüística Teòrica i Aplicada
Year: 2014-15
Trimestre: 2nd
Number of ECTS credits: 5
Hores de dedicació de l'estudiant: 125
Tipus d'assignatura:
Professor/s
: Toni Badia
Language used: English

 

1. Course presentation

The course is a general presentation of the aspects of natural langauge processing related to the structure of words and sentences. The course combines the theoretical presentation of the main topics in the area with a practical approach to the main strategies. It is comprehensive and covers both symbolic and statistical approaches to natural language processing.

The course can be followed by students interested in acquiring a general overview of the filed, and by those interested in a deeper uinderstanding of the topics covered.

 

2. Objectives

The main objective of the course is to learn the techniques currently used in the computational treatment of morphology and syntax, that is, of words and sentences as strings of words.

When following it the student will:

* know, write and use morphological processors

* know, write and use syntactic processors

* know, write and use morphosyntactic taggers

 

3. Syllabus

1. Regular expressions and Finite-State Automata

2. Computational morphology and Finite-State Transducers

1. survey of basic aspects of morphology

2. the lexicon and morphotactics

3. orthographic rules

4. morphological analysis with finite-state transducers

3. n-gram language models

1. què i com comptem en els corpus lingüístics,

2. n-grames simples ,

3. smoothing i altres tècniques de millora dels models de n-grames.

4. Part-of-Speech Tagging

1. morphosyntactic tags

2. morphosyntactic tagging

3. general problems in morphosyntactic tagging

5. Formal Grammars

1. survey of basic aspects of syntax

2. Context Free Grammars

3. Dependency Grammars

6. Syntactic Parsing

1. parsing as search

2. dynamic parsing methods

3. partial parsing

7. Features and unification

1. feature structures and unification

2. feature structures in the grammar

3. implementation of unification

4. parsing with unification grammars

5. types and inheritance

8. statistical parsing

1. probabilistic CFGs

2. problems with probabilistic CFGs

3. probabilistic lexicalised CFGs

 

4. Assessment

The assessment of the course will be based in:

  • readings required

  • participation in class discussion

  • practical exercises assigned

  • either an essay or a final exam

 

5. Methods and activities

Every week, the course is organized in the following way:

  • reading before the class

  • discussion of the reading in class

  • practical activities in class (that may be continued after class)

 

6. References

* Jurafsky, Daniel & Martin, James H. (2009), Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2a edició. Prentice Hal

* Bird, Steven; Klein, Ewan & Loper, Edward (2009), Natural Language Processing with Python. Analyzing Text with the Natural Language Toolkit. O'Reilly Media.

Other recommended readings:

* Allen, James (1994), Natural Language Understanding. 2nd edition. Addison Wesley.

* Coleman, John (2005), Introducing speech and language processing. Cambridge University Press.

* Manning, Christopher D. & Schütze, Hinrich (1999), Foundations of Statistical Natural Language Processing. The MIT Press.