# N-gram smoothing models¶

When dealing with n-gram models, smoothing refers to the practice of adjusting empirical probability estimates to account for insufficient data.

In the descriptions below, we use the notation \(w^{j}_{i}\), \(i < j\), to denote the (j - i)-gram \((w_{i}, w_{i+1}, \ldots, w_{j})\).

### Laplace Smoothing

Laplace smoothing is the assumption that each n-gram in a corpus occurs exactly one more time than it actually does.

where \(c(a)\) denotes the empirical count of the n-gram \(a\) in the corpus, and \(|V|\) corresponds to the number of unique n-grams in the corpus.

**Models**

### Additive/Lidstone Smoothing

Additive/Lidstone smoothing is a generalization of Laplace smoothing, where we assume that each n-gram in a corpus occurs k more times than it actually does (where k can be any non-negative value, but typically ranges between [0, 1]):

where \(c(a)\) denotes the empirical count of the n-gram \(a\) in the corpus, and \(|V|\) corresponds to the number of unique n-grams in the corpus.

**Models**

### Good-Turing Smoothing

Good-Turing smoothing is a more sophisticated technique which takes into account the identity of the particular n-gram when deciding the amount of smoothing to apply. It proceeds by allocating a portion of the probability space occupied by n-grams which occur with count r+1 and dividing it among the n-grams which occur with rate r.

where \(r^*\) is the adjusted count for an n-gram which occurs r times, g(x) is the number of n-grams in the corpus which occur x times, and N is the total number of n-grams in the corpus.

**Models**

**References**

[1] | Chen & Goodman (1998). “An empirical study of smoothing techniques
for language modeling”. Harvard Computer Science Group Technical Report
TR-10-98. |

[2] | Gale & Sampson (1995). “Good-Turing frequency estimation without
tears”. Journal of Quantitative Linguistics, 2(3), 217-237. |