Big data, big science: students share their research on “big data” in a poster session

UNIVERSITY PARK, Pennsylvania — Today alone, enough data will be produced to fill 250,000 Libraries of Congress, according to a 2016 report by Mikal Khoso of Northeastern University. On a larger scale, estimates indicate that 4.4 zettabytes of data (or 44 trillion gigabytes) existed worldwide in 2013, an amount that is expected to increase tenfold by 2020.

This data comes in all shapes and sizes, from text tweets to satellite imagery, and its variability was on display during the Big Data Social Science (BDSS) poster session held April 14 on the campus of University Park. Poster topics included: identifying bullying tweets (Amy Zhang, statistics and Diane Felmlee, sociology); social covariates of the HIV epidemic (Ben Sheng, Xun Cao and Le Bao, statistics); virtual reality and decision making (Mark Simpson and Alexander Klippel, geography); and the racial segregation of home and work environments (Robert Zuchowski and Stephen Matthews; sociology and demography).

Students Matthew Denny (political science), Cassie McMillan (sociology, demography) and Sayali Phadke (statistics) also presented during the poster session. As doctoral students in the National Science Foundation-funded BDSS Integrative Graduate Education and Research Training (IGERT) program, each strives to improve current analytical techniques and apply them to the exponential explosion of political data. , geographical and social networks produced every day.

The work presented by Denny takes a nuanced approach to network analysis by examining not only whether particular nodes are connected, but also the strength of those connections. To put this into context, imagine a celebrity’s connections on a social network like Facebook. Everyone they are friends with can be considered a bond; but, distinguishing friends from fans requires more information. One way to do this is to consider the strength of those ties by looking at how often the celebrity and their “friends” like each other’s posts. Denny and his adviser, Bruce Desmarais, associate professor of political science at Penn State, recently published a model in Social Networks that deals with precisely this type of weighted network, applied to loan data from 17 countries.

“We think there is a big hole in the market for people trying to understand systemic risk and there are some really exciting applications in terms of improving risk management in the financial system by adopting these network analysis techniques,” Denny said of the potential consequences of his work. For example, he thinks their model could help them understand “how the relationship between banks, economies, or countries underlies the risk of financial collapse and how countries can respond.”

Phadke, on the other hand, explores another direction – she examines how influence spreads through networks. To explain his work, Phadke appeals to an ubiquitous aspect of modern life: advertisements.

“Let’s say there’s a company that wants to study the effect of an advertisement,” she began. “In all classical statistical methods, you assume that two units [people who see the ad] are independent of each other, but as soon as you set up a network, you look for units that communicate. So if you’re assuming that showing an ad to someone means you’re going to affect someone’s purchase outcome, you might be considering underestimating the effect of your ad and investing more in it. more money than you really need.

However, the model developed by Phadke has more applications than just saving money for businesses. She suggested the model could be used to assess the effectiveness of public health initiatives or even international trade regulations.

While Phadke and Denny focused on improving statistical models, McMillan applies them to solve a common problem: bullying. McMillan used network analysis to assess the likelihood of bullying among students at two points in time. She found that unlike the plot of many teen dramas, bullying is more common among students of similar social status.

“Our project has the potential to better inform school-based prevention and intervention programs that target adolescent bullying behaviors,” McMillan said. “Pop culture often characterizes victims of bullying as teenagers who are on the periphery of their social networks, while bullies are more popular peers with no other social relationships with their victims. While this characterizes some of the bullying behaviors observed in our sample, much bullying at school occurs between friends and between those who are similarly positioned in their social networks.

“When designing prevention and intervention programs, professionals should keep in mind that adolescents often bully each other in an effort to gain social status, and this is best achieved in s ‘taking from those who are most popular and occupy a similar position in the social hierarchy.’

To learn more about ongoing research and other information about Penn State’s BDSS-IGERT program, please visit http://bdss.psu.edu.

Sean N. Ayres