Abstract
The emergence and popularization of contemporary social media (SM), including Facebook, Twitter, and Instagram, have reshaped how politicians communicate with the electorate and run electoral campaigns. The inherent capabilities of SM, such as accessing a large amount of real-time data, have led to a new research focus on using this data to predict election outcomes. Despite numerous studies conducted over the past decade, the results remain controversial and frequently challenged. This article aims to investigate and summarize the evolution of research on predicting elections based on SM data, outlining the current state of the art and practice, and identifying research opportunities within this field. Methodologically, we conducted a systematic literature review, analyzing the quantity and quality of publications, the electoral contexts of the studies, the main approaches and characteristics of successful studies, their strengths and challenges, and compared our findings with previous reviews. We identified and analyzed 83 relevant studies, noting challenges in areas such as process, sampling, modeling, performance evaluation, and scientific rigor. Key findings include the low success rate of the most-used approach, volume and sentiment analysis on Twitter, and better results from new approaches like regression methods trained with traditional polls.
Lastly, we discuss a vision for future research, emphasizing the integration of advances in process definitions, modeling, and evaluation. This includes a call for better investigation into the application of state-of-the-art machine learning approaches.