Colloquium - Discovering and Mitigating Social Data Bias

110 MLH
Fred Morstatter
Arizona State University | Computer Science

Researchers and practitioners use social media to extract actionable patterns about human behavior. However, the validity of these patterns hinges, in part, on leveraging a dataset that is representative of society. The data collected from social media is not always representative of the real world, and in many cases it is not even representative of social media itself. This talk will introduce ways in which the social data upon which inferences are drawn differ from the underlying populations and trends in the real world. Furthermore, I discuss the statistically significant differences between the data generated on social media and the social media data commonly used in research. These observations pave the way for the discovery and removal of bias within social media data. Next, I will introduce methodologies to clearly extract patterns from social media data by identifying and removing specific sources of bias. This has important implications for social media mining, namely that the behavioral patterns and insights they extract will be more representative of society. This will allow for more accurate measurements and findings from social media data by researchers and practitioners.


Fred Morstatter is a PhD candidate in computer science at Arizona State University in Tempe, Arizona. His research focuses on finding and removing biases that can skew research results from big social data. Among his publications are an ICWSM paper that investigates the representativeness of Twitter's Streaming API, two WWW papers that identify periods of bias automatically in streaming Twitter data, 2 KDD papers, and a book: Twitter Data Analytics. He won the World Wide Web conference's Best Poster Award in 2016. He is a 2016 Faculty Emeriti Fellow, Dean's Fellow, and University Graduate Fellow. He has served as a PC member of ICWSM 2014, 2016, and 2017, the IEEE/CIC ICCC 2014 Symposium on Social Networks and Big Data, and has been a co-chair of the Social Computing, Behavioral-Cultural Modeling and Prediction Conference's Grand Challenge organizing committee in 2014, 2015, and 2016. He has been a Visiting Scholar at Carnegie Mellon University as well as a Research Intern at Microsoft Research. He is the Principal Architect for TweetXplorer, a visual analytic system for Twitter data. A full list of publications can be found at Contact him at