Unit I
Introduction - Rationalist and Empiricist Approaches to Language - Scientific Content - The Ambiguity of Language: Why NLP Is Difficult - Mathematical Foundations - Elementary Probability Theory - Essential Information Theory - Linguistics Essentials - Parts of Speech and Morphology - Phrase Structure - Semantics and Pragmatics.
Unit II
Collocations - Frequency - Mean and Variance - Hypothesis Testing - Mutual Information - The Notion of Collocation - Statistical Inference: n-gram Models over Sparse Data - Bins: Forming Equivalence Classes - Statistical Estimators - Combining Estimators- Lexical Acquisition - Evaluation Measures.
Unit III
Markov Models - Markov Models - Hidden Markov Models - The Three Fundamental Questions for HMMs - HMMs: Implementation, Properties, and Variants - Part-of-Speech Tagging - The Information Sources in Tagging - Markov Model Taggers - Hidden Markov Model Taggers.
Unit IV
Statistical Alignment and Machine Translation - Text Alignment - Word Alignment - Statistical Machine Translation - Clustering - Hierarchical Clustering - Non-Hierarchical Clustering.
Unit V
Topics in Information Retrieval - Some Background on Information Retrieval - The Vector Space Models - Term Distribution Models - Latent Semantic Indexing - Discourse Segmentation - Text Categorization - Decision Trees.
Text Book:
"Foundations of Statistical Natural Language Processing" - Christopher D. Manning and Hinrich Schütze - MIT Press - 1999.
References Books:
Speech and Language Processing,Daniel Jurafsky, James Martin, Pearson Education, 2008.

