Abstract
Text datasets form the foundation of Natural Language Processing (NLP), powering applications from online search and recommendation engines to decision-critical systems in healthcare, law, and finance. However, these datasets often encode social, cultural, historical, and annotation-driven biases that, when learned by AI, propagate and amplify unfair outcomes. This paper presents a comprehensive and comparative review of bias detection and mitigation in text datasets. It surveys major forms of bias—representational, stereotypical, sampling, annotation, cultural, and algorithmic—while analyzing state-of-the-art detection tools such as statistical audits, embedding association tests (e.g., WEAT, SEAT), explainable AI (XAI), and behavioral probing. It further examines mitigation strategies across the NLP pipeline: pre-processing (data balancing, counterfactual augmentation), in-processing (adversarial training, fairness-aware objectives), and post-processing (threshold calibration, output rewriting). The review critiques benchmark datasets like StereoSet, CrowS-Pairs, Jigsaw Toxicity, BEADS, and domain-specific corpora. Persistent gaps include intersectionality, multilingual fairness, dataset documentation, annotation bias, and the need for dynamic, real-time monitoring in deployed systems. The paper concludes with future research directions emphasizing living benchmarks, participatory AI, explainable fairness, and continuous bias auditing.