Big data, big science: students share their research on “big data” during a poster session

UNIVERSITY PARK, PA – Today alone, enough data will be produced to fill 250,000 congressional libraries, according to a 2016 report by Mikal Khoso of Northeastern University. On a larger scale, estimates indicate that 4.4 zettabytes of data (i.e. 44 trillion gigabytes) existed worldwide in 2013 – an amount that is expected to increase tenfold by 2020.

This data comes in all shapes and sizes, from text tweets to satellite images, and its variability was exposed during the Big Data Social Science (BDSS) poster session held on April 14 at the campus of University Park. Topics for the posters included: identifying bullying tweets (Amy Zhang, statistics and Diane Felmlee, sociology); social covariates of the HIV epidemic (Ben Sheng, Xun Cao and Le Bao, statistics); virtual reality and decision making (Mark Simpson and Alexander Klippel, geography); and, the racial segregation of domestic and professional environments (Robert Zuchowski and Stephen Matthews; sociology and demography).

Students Matthew Denny (political science), Cassie McMillan (sociology, demography) and Sayali Phadke (statistics) also presented at the poster session. As doctoral students of the National Science Foundation-funded BDSS Integrative Graduate Education and Research Training (IGERT) program, each strives to improve current analytical techniques and apply them to the exponential explosion. political, geographic and social media data produced every day.

The work presented by Denny takes a nuanced approach to network analysis by examining not only whether particular nodes are connected, but also the strength of those connections. To put this in context, imagine a celebrity’s connections on a social network like Facebook. Anyone they are friends with can be seen as a connection; but, distinguishing between friends and fans requires more information. One way to do this is to consider the strength of those connections by looking at how often the celebrity and their “friends” like each other’s posts. Denny and his advisor, Bruce Desmarais, associate professor of political science at Penn State, recently published a model in Social Networks that deals with precisely this type of weighted network, applied to loan data from 17 countries.

“We believe that there is a big gap in the market for people trying to understand systemic risk and that there are some really interesting applications in terms of improving risk management in the financial system by adopting these techniques. network analysis, ”said Denny explaining the potential implications of his work. For example, he thinks their model could help them understand “how the relationship between banks, economies or countries underlies the risk of financial collapse and how countries may respond.”

Phadke, meanwhile, explores another direction – she examines how influence spreads through networks. To explain his work, Phadke appealed to an ubiquitous aspect of modern life: advertisements.

“Let’s say there is a company that wants to study the effect of an advertisement,” she began. “In all classical statistical methods, you assume that two units [people who see the ad] are independent of each other, but as soon as you have set up a network, you are looking for units that communicate. So if you are assuming that showing someone an ad means that you are going to affect the outcome of a person’s purchase, you may be looking to underestimate the effect of your ad and invest more in it. money than you really need.

The model Phadke is developing, however, has more applications than just saving money for businesses. She suggested that the model could be used to assess the effectiveness of public health initiatives or even international trade regulations.

While Phadke and Denny have focused on improving statistical models, McMillan applies them to solve a common problem: bullying. McMillan used network analysis to assess the likelihood of bullying among students at two specific points in time. She found that, unlike the plot of many teen dramas, bullying is more common among students of similar social status.

“Our project has the potential to better inform prevention and intervention programs in schools that target adolescent bullying behavior,” McMillan believes. “Popular culture often characterizes victims of bullying as teenagers who are on the periphery of their social networks, while abusers are more popular peers with no other social connections with their victims. While this characterizes some of the bullying behaviors observed in our sample, a lot of school bullying occurs between friends and between those who are positioned similarly in their social networks.

“When designing prevention and intervention programs, professionals should keep in mind that adolescents often bully each other in an attempt to gain social status.

To learn more about current research and other information about Penn State’s BDSS-IGERT program, please visit http://bdss.psu.edu.

Sean N. Ayres