N-grams are crucial in computational linguistics and natural language processing (NLP) for modeling the likelihood of sequences of words or phonemes occurring in a given language. They serve as the building blocks for various applications, such as text prediction, speech recognition, and statistical machine translation. An n-gram model predicts the occurrence of an element based on the occurrences of the previous n-1 elements, thus providing a simple but effective way to capture linguistic context and dependencies. The effectiveness of an n-gram model usually depends on the size of n, where larger n provides more context but requires more computational resources and larger datasets to achieve accurate predictions.

Historical overview: The concept of n-grams was introduced in the field of computational linguistics in the 1950s, with significant adoption in various language processing tools and models in the subsequent decades, particularly as computational power increased.

Key contributors: The development and use of n-grams in language modeling have been influenced by many researchers in the field of computational linguistics. Notably, Claude Shannon discussed the concept of predicting text using statistical models of letter combination frequencies in his groundbreaking work on information theory in the late 1940s and 1950s, laying foundational concepts that would be built upon with n-grams in language tasks.