Detailed Explanation: The MDL principle, rooted in information theory, posits that the best model for a dataset is the one that compresses the data the most effectively. It aims to balance model complexity and data fitting by minimizing the sum of the length of the description of the model and the length of the description of the data when encoded with the model. This approach prevents overfitting by penalizing overly complex models that might fit the noise in the data rather than the underlying structure. In practice, MDL is used in model selection and various machine learning tasks to ensure that chosen models generalize well to unseen data. It connects deeply with concepts such as Kolmogorov complexity and Bayesian inference, where shorter descriptions correspond to higher prior probabilities.

Historical Overview: The concept of MDL was first introduced by Jorma Rissanen in 1978, amidst growing interest in the intersection of statistics and information theory. It gained popularity in the 1980s and 1990s as computational resources grew, allowing for practical applications in model selection and data compression.

Key Contributors: Jorma Rissanen is the primary figure associated with the development of the MDL principle. His foundational work laid the groundwork for subsequent advances and applications in both theoretical and applied statistics, influencing fields such as machine learning, data mining, and statistical modeling.