
Coverage Bias
Uneven representation of groups or topics within a dataset, leading to skewed AI model outcomes.
Coverage bias occurs when datasets used to train AI models fail to represent elements of the population or domain uniformly, leading to models that may perform well for overrepresented segments and poorly for underrepresented ones. This imbalance can result in AI systems that perpetuate or exacerbate existing social disparities, as they are more likely to mispredict or provide erroneous outputs when encountering data points from underrepresented groups. In ML, ensuring equitable coverage through diversity in datasets is critical for achieving fairness and accountability, as it directly impacts the reliability and generalizability of the models in real-world applications.
The term 'coverage bias' gained initial traction in the context of AI and ML in the early 2000s with the growing concern over social biases embedded in algorithmic decision-making systems. It gained prominence particularly in the late 2010s as part of a broader movement towards ethical AI, where coverage bias was identified as a key factor in algorithmic fairness and trustworthiness.
Key contributors to the development and exploration of coverage bias include researchers in the fields of AI ethics and data science, such as Kate Crawford and Timnit Gebru, whose work has illuminated the critical implications of data bias in AI systems. Their insights have been vital in shaping the discourse around fairness in AI and pushing for more inclusive practices in data collection and model evaluation.