Can texting a friend get more people to vote? This data scientist wants to find out.

How was your experience at Columbias Data Science Institute has prepared you for Chicagos?

At the Institute, I was able to lead collaborations between socio-political scientists and statisticians-computer scientists and learn to do truly interdisciplinary research. I also helped organize interdisciplinary programs, such as the Distinguished Lecture Series. At UChicago, I will seek to collaborate with the many incredible social and political scientists on campus to bring modern datasets to issues of policy importance. I also look forward to helping grow their institute and defining data science as an emerging discipline.

What made you go from studying foreign policy, linguistics and even Farsi to data science?

My first exposure to data science was as an undergraduate student at UMass when I audited Hanna Wallach’s Computational Social Science Seminar. It was a thrill, but I didn’t have the formal training in machine learning to run with it. I then did an internship at a federal research lab studying public opinion in Iran via Persian blogs. The goal was to provide U.S. policymakers with nuanced insights into political thinking in Iran at a time when there was still hope for an easing in U.S.-Iranian relations. I worked on natural language processing methods to characterize sentiment and topics in these blogs. It got me excited about computer science and statistics, and set me on the path to eventually doing a PhD in computer science, which I did at UMass with Hanna Wallach.

Any advice for others struggling to find their academic orientation?

I’m probably the wrong person to ask. I’ve never been good at learning things that didn’t already interest me. (I’m still not.) I was lucky to find myself in a hot field, with jobs. I would like to say that following your curiosity is a good strategy, but I think that would only propagate the survival bias. I’ll plug in data science and statistics, though. As a profession, it allows you to move around. I focus on political science, but have collaborated with geneticists, economists, and neuroscientists. It’s hard to get bored as a methodologist!

Is texting a friend as effective as knocking on doors to get the vote? Are there any caveats?

This is the question I asked with David Blei, Donald Green and others. We conducted large-scale randomized field experiments on Outvote, an app that lets Americans text friends to remind them to vote. We found that Outvote users had an effect of about eight percentage points on getting their friends to vote during the 2018 midterm. This is significant compared to what was measured for door-to-door, telephone banking and other actions to get the vote. But we ran another experiment in the 2020 presidential elections and found much weaker effects. This wasn’t unexpected, as nudges are generally less effective in presidential elections, but we’re waiting for the 2022 midterms to see if we can replicate our 2018 results. Stay tuned!

Predictions on the next Midterms?

I have predictions, but they’re probably no better than yours. Text your friends to remind them to register and vote.

Youco-directed a popular workshop at NeurIPS, the worlds top conference on machine learning, on beautiful ideas that do notthat works. Why?

Today, machine learning research is increasingly mechanized and competitive. Researchers are incentivized to produce new methods that go beyond baselines, rather than understanding the fundamentals or developing new approaches to problems. These workshops aim to promote negative outcomes, highlight gaps between theory and practice, and solicit “nice” ideas that don’t necessarily “work” (yet).

YouI am the first postdocmet a probability distribution named after you. What is the Distribution Schein?

We featured it in a 2019 paper at NeurIPS, Poisson-Randomized Gamma Dynamical Systems. We called it the “shifted confluent hypergeometric distribution (SCH)” because it is a variation of a previously known distribution. The characterization of this distribution was one of the ingredients of an algorithm for fitting a data model of interaction between countries. These data are made up of micro-records of the form “countries I took action a in the country I at the time you,” and there are millions of such events. Ours was a time series model that could characterize uncertainty about unobserved or future events.

One of my co-authors, Scott Linderman, co-wrote a follow-up paper on building a neuroscientific data model using a similar methodology, and they renamed it “Schein’s distribution”. My mother has a printout of the paper posted on her office door.

Does this often happen to data scientists?

Have a distribution named for them? Nope! It’s rare and awesome!

You were a political organizer growing up in Brookline, Mass. Has it shaped your work?

Not really, but it influenced my life at Columbia in a big way. I recently learned that a longtime friend and co-campaigner for John Kerry in 2004 now owns the Hungarian Patisserie. We reconnected one day while I was ordering pastries.

What should everyone know about data science?

It is common these days to wonder which methods will become obsolete once we have collected enough data. But my point of view is that there is no “Big Data”. It depends on your questions; if your data is “big” for the questions you are asking, then maybe you should ask bigger questions! We will always need theory, domain knowledge and a tailor-made methodology to answer these big questions.

How is working with social scientists different from working with physical scientists?

My sense is that social scientists emphasize theory to guide their empirical work and are generally less willing to take a purely inductive approach to science. This makes sense, given the traditional scarcity of social science data. But this data is getting richer and richer, so that may change. For now, I think computational social science means engaging in theory.

Sean N. Ayres