Syllabus

The syllabus consists of

  • Weekly slides
  • Weekly exercises
  • Mandatory assignments
  • Readings

The detailed syllabus for each week is listed under each week page.

This is an overview of the mandatory readings For recommended readings, exercises etc. see the weekly pages.

Jurafsky and Martin, Speech and Language Processing, 3. ed. (edition Jan. 2022)

  • Ch. 2 Regular expressions, etc.
    • Sec. 2.0
    • Sec. 2.2 Words
    • Sec. 2.3 Corpora
    • Sec. 2.4 Normalization, except 2.4.3 and the technical details of 2.4.1
    • Sec 2.5 Edit distance
  • Ch. 3, "N-gram Language Models"
    • Sections 3.0-3.5
  • Ch. 4,  "Naïve Bayes Classification and Sentiment"
    • Except (for now) section 4.9 Statistical significance testing
  • Ch. 5,  "Logistic Regression", Except
    • Scaling input features in Sec. 5.2.2
    • Sec. 5.7, Last paragraph, starting with "Both L1 and L2..."
  • Ch. 6, "Vector Semantics and Embeddings", everything except
    • Not section 6.6 Pointwise Mutual Information (PMI)
  • Ch. 7 "Neural Networks and Neural Language Models"
  • Ch. 8 "Sequence labeling"
    • Except 8.4.5-8.4.6 "The Viterbi Algorithm"
  • Ch. 9 Deep Learning Architectures for Sequence Processing
    • Sec. 9.1-9.5
  • Ch. 10 Machine Translation and Encoder-Decoder Models
    • Sec. 10.0, 10.2-10.4
  • Ch. 17 Relation Extraction
    • Sec. 17.0-17.1
  • Ch. 18, "Word Senses and Word Net"
    • Sec. 18.0-18.3
  • Ch. 24, "Dialogue systems and chatbots,
    • Sections 24.1-24.6
  • Ch. 25, "Phonetics"
    • Sections 25.1-25.5 (excluding the details not discussed in class)
  • Chap 26, "Speech Recognition and ASR"
    • Sections 26.1 and 26.5 (excluding the part on statistical significance)

NLTK Book

  • Ch. 3, sec. 6 Normalizing Text
  • Ch. 3, sec. 8 Segmentation
  • Ch. 5, sec. 1 Using a tagger
  • Ch. 5, sec. 2 Tagged corpora

Wikipedia

Other:

Published Dec. 6, 2022 10:21 AM - Last modified Dec. 6, 2022 10:21 AM