This is a brief set of notes regarding some very rough Python scripts that I'm writing to help me grasp the material in R. Harald Baayen's "Word Frequency Distributions". To a certain extent, they replicate the tables and charts in the text.

A Python companion to

Word Frequency Distributions by R. Harald Baayen

Last modified: $Date: 2006-06-19 11:13:37 +0800 (Mon, 19 Jun 2006) $

  1. Word Frequencies
    1. Introduction
    2. The frequency spectrum
    3. Zipf
    4. The quest for characteristic constants
    5. The lognormal distribution
    6. Discussion
    7. Bibliographical Comments
    8. Questions
  2. Non-parametric models
    1. Basic concepts
    2. The Urn model
    3. The Structural Type Distribution
    4. The LNRE zone
    5. Good-Turing Estimates
    6. Interpolation and Extrapolation
      1. Interpolation
      2. Extrapolation
    7. Discussion
    8. Bibliographical Comments
    9. Questions
  3. Parametric models
    1. Introduction
    2. LNRE models
      1. The Lognormal Structural Type Distribution
      2. The Generalized Inverse Gauss-Poisson Structural Type Distribution
      3. The Zipfian Family of LNRE Models
    3. Evaluating Goodness of Fit
    4. Parameter estimation
    5. A comparative study
    6. Comparing Lexical Measures Across Texts
    7. Discussion
    8. Bibliographical Comments
    9. Questions
  4. Mixture Distributions
    1. Introduction
    2. Expectations, variances, and covariances
    3. Examples of mixture distributions
      1. A text-level mixture model
      2. Morphological mixtures
    4. Morphological Productivity
    5. Discussion
    6. Bibliographical Comments
    7. Questions
  5. Mixture Distributions
    1. Introduction
    2. Expectations, variances, and covariances
    3. Examples of mixture distributions
      1. A text-level mixture model
      2. Morphological mixtures
    4. Morphological Productivity
    5. Discussion
    6. Bibliographical Comments
    7. Questions
  6. The Randomnness Assumption
    1. The Randomness Assumption
      1. Non-randomness and lexical specialization
      2. Consequences of non-randomness
    2. Adjusted LNRE models
      1. Partition-based adjustment
      2. Parameter-based adjustment
    3. Discussion
    4. Bibliographical Comments
  7. Examples of Applications
    1. Distribution properties of the lexicon
      1. Word length and sample size
      2. Matching relibaility across corpora
    2. Morphological productivity
      1. Global analyses
      2. Productivity and register
    3. Authorship and Style
    4. Beyond word frequency distributions
      1. Counts of filarial worms on mites on rats
      2. Year references
      3. CV-structures
      4. Word pairs
      5. Discussion
    5. Some practical guidelines