Computational Social Science and Big Data

Sociological Research Methods

Digital trace data, text analysis, network analysis, Salganik's *Bit by Bit*, and the ethics of web-scraped data.

1

Learning Material

4 pages

The Rise of Computational Social Science

Seite 1 von 4

In 2009, David Lazer and colleagues published Computational Social Science in Science, announcing that the digitization of everyday life had produced a new kind of empirical opportunity: vast volumes of digital records of human behavior (emails, clicks, calls, messages, transactions, movements) that could, in principle, let social scientists see patterns invisible to surveys or ethnography. A decade and a half later, computational social science has matured into a recognizable subfield with its own journals, curricula, and methodological conventions. Matthew Salganik's textbook Bit by Bit: Social Research in the Digital Age (2018) remains the single best guide to what the field has learned.

Salganik organizes the new research around four activities enabled by digital technology: observing behavior (digital trace data), asking questions (online surveys and interviews), running experiments (large-scale online experiments), and creating mass collaboration (citizen science, crowdsourcing, collective intelligence). Each activity has its own strengths and distinctive pitfalls. Digital trace data are cheap, large, and real-time, but often unrepresentative, non-reactive but also algorithmically shaped, and produced for business reasons rather than research. Online experiments (MTurk, Prolific, on-platform experiments inside social networks) allow treatment effects to be estimated at huge scale, but raise questions about ecological validity and ethics of unobtrusive manipulation. The 2014 Facebook emotional-contagion study by Kramer, Guillory, and Hancock showed both the reach and the ethical peril of on-platform experiments: the paper demonstrated that users shown fewer positive posts subsequently wrote fewer positive posts, but provoked wide criticism over consent.

Salganik's recurring theme is that the new data sources are found data rather than data designed for research. A Twitter firehose, a telecom call-detail record, a Google search log — these were produced for business operations, not for answering sociological questions. Turning found data into research data requires care about what is measured, who is included, and how platform design shapes the behavior. The best computational social science is less about exotic algorithms than about disciplined thinking: clear research questions, explicit measurement, awareness of bias, and integration with classical social-science theory.

High-profile computational studies include Jon Kleinberg, Sendhil Mullainathan, and colleagues' work on algorithmic bias in bail decisions; Duncan Watts's experiments on cultural markets in MusicLab (Salganik, Dodds, Watts, 2006); Gary King's text analysis of Chinese social-media censorship (King, Pan, Roberts, 2013); and Bail et al.'s 2018 experiment exposing Twitter users to opposing political views, which showed increased rather than decreased polarization. Each illustrates how computational methods can produce causal and descriptive findings that would be impossible with traditional methods alone.

Want more?

Sign up for AI tutoring, study plans, exam prep, and more.

Sign up free