This is a brief set of notes regarding some very rough Python scripts that I'm writing to help me grasp the material in R. Harald Baayen's "Word Frequency Distributions". To a certain extent, they replicate the tables and charts in the text.
Problems? Suggestions for improvement? Email me (substitute 'heart' for the image of the heart):
A Python companion to
Word Frequency Distributions by R. Harald Baayen
Last modified: $Date: 2006-06-19 11:13:37 +0800 (Mon, 19 Jun 2006) $
-
Word Frequencies
- Introduction
- The frequency spectrum
- Zipf
- The quest for characteristic constants
- The lognormal distribution
- Discussion
- Bibliographical Comments
- Questions
-
Non-parametric models
- Basic concepts
- The Urn model
- The Structural Type Distribution
- The LNRE zone
- Good-Turing Estimates
- Interpolation and Extrapolation
- Interpolation
- Extrapolation
- Discussion
- Bibliographical Comments
- Questions
-
Parametric models
- Introduction
- LNRE models
- The Lognormal Structural Type Distribution
- The Generalized Inverse Gauss-Poisson Structural Type Distribution
- The Zipfian Family of LNRE Models
- Evaluating Goodness of Fit
- Parameter estimation
- A comparative study
- Comparing Lexical Measures Across Texts
- Discussion
- Bibliographical Comments
- Questions
-
Mixture Distributions
- Introduction
- Expectations, variances, and covariances
- Examples of mixture distributions
- A text-level mixture model
- Morphological mixtures
- Morphological Productivity
- Discussion
- Bibliographical Comments
- Questions
-
Mixture Distributions
- Introduction
- Expectations, variances, and covariances
- Examples of mixture distributions
- A text-level mixture model
- Morphological mixtures
- Morphological Productivity
- Discussion
- Bibliographical Comments
- Questions
-
The Randomnness Assumption
- The Randomness Assumption
- Non-randomness and lexical specialization
- Consequences of non-randomness
- Adjusted LNRE models
- Partition-based adjustment
- Parameter-based adjustment
- Discussion
- Bibliographical Comments
- The Randomness Assumption
-
Examples of Applications
- Distribution properties of the lexicon
- Word length and sample size
- Matching relibaility across corpora
- Morphological productivity
- Global analyses
- Productivity and register
- Authorship and Style
- Beyond word frequency distributions
- Counts of filarial worms on mites on rats
- Year references
- CV-structures
- Word pairs
- Discussion
- Some practical guidelines
- Distribution properties of the lexicon