Chapter 1 – Introduction: about the book and its contents.
Part I: Fundamentals
Chapter 2 – Handling Textual Data: introduction to the different types of variables used to manipulate text and some useful built in functions.
Chapter 3 – Regular Expressions: in depth exploration of regular expressions in the MATLAB programming environment.
Chapter 4 – Basic Operations with Strings: search, replacement, segmentation, concatenation and basic set operations with strings.
Chapter 5 – Reading and Writing Files: description of methods and tools for manipulating most commonly used file formats.
Part II: Mathematical Models
Chapter 6 – Basic Corpus Statistics: illustration of the basic properties of natural language and introduction to some useful statistical definitions.
Chapter 7 – Statistical Models: introduction to fundamental concepts in the statistical approach to language modeling (word n-grams, discounting, interpolation, etc).
Chapter 8 – Geometrical Models: introduction to fundamental concepts in the geometrical approach to language modeling (vector spaces, vector similarity, etc).
Chapter 9 – Dimensionality Reduction: description of methods for dimensionality reduction in geometrical representations of language.
Part III: Methods and Applications
Chapter 10 – Document Categorization: unsupervised clustering, supervised classification and terminology extraction.
Chapter 11 – Document Search: binary search, vector-based search, evaluation metrics and other fundamental concepts in Information Retrieval.
Chapter 12 – Content Analysis: polarity and intensity estimation, and property extraction with pattern matching.