Presidential Data Science Fellows Bring Data Research to Life, and Vice Versa

Since its inception three years ago, a primary goal of the Data Science Institute at the University of Virginia has been to help create opportunities for cross-field collaboration in “big data” research.

Through a grant from the Jefferson Trust, the institute, in conjunction with the Office of the Vice President for Research, has inspired graduate students from a variety of disciplines to work together on big ideas involving real-world problems and to tackle these problems with data. piloted solutions.

Several projects have come to fruition through a competitive process that drives some of the most innovative ideas forward. Their leaders are appointed Presidential Fellows in Data Science, in recognition of the support of the AVU President’s office.

“This scholarship offers a truly rare and valuable opportunity,” said Don Brown, director of the Data Science Institute. “In very few universities would you find graduate students in systems engineering and psychology working on a better understanding of suicide risk, or graduate students in English and psychology exploring the language of climate change.

This type of “impactful, cross-cutting work” is usually only available to those who earn a doctorate. appointments to research centers and institutes focused on particular problems, he said, adding that alumni of the one-year fellowship program produced research papers and participated in the generation of new proposals from their scholarship activities.

“We are very proud of these students, their academic advisors and their accomplishments,” said Brown.

Below are descriptions of some of the more recent projects, started in 2015. To learn more about these and other projects, and the graduate students leading them, click here.

Data-driven motion and sound design

Led by Lin Bai, a Ph.D. candidate in electrical and computer engineering, and Jon Bellona, ​​a Ph.D. candidate in music

Formal description of the project: Our team is working to improve the perceived variation in robotic movements. By capturing and analyzing human vocalizations created in response to simulated movements, the project will develop robotic movements synchronized with perceptually designed sonifications in order to make robotic movements more expressive, thereby increasing the level of human perception of the quality of the body. robotic movement.

What is really going on: How do you know if someone is calm, excited, or sad? Humans communicate non-verbally through the expressive qualities of their movements. We also communicate through the non-verbal aspects of our voice. The pitch, volume and speed of our voices can tell listeners about our emotional state. What if robots could also move expressively? Some of our team and others are working on this issue; however, there are practical limitations. Our work, a collaboration between roboticists and musicians, aims to give robots an expressive ‘voice’ so that robots can better interact and work alongside humans in various contexts such as manufacturing, healthcare or the home.

The role of big data: We have recorded musicians making expressive sounds to match the qualities of expressive movements. Using signal processing tools and statistics, we analyze various qualities of these sounds in order to understand how the sound characteristics correspond to the characteristics of movement. We will validate our results through a large study to test whether these maps lead to a more precise perception of expressive qualities in robotic movement.

Applying Machine Learning to Text Communications to Model Suicide Risk in Real Time

Led by Jeffrey Glenn, a Ph.D. candidate in psychology, and Alicia Nobles, a Ph.D. candidate in systems and information engineering

Formal description of the project: Our team strives to improve objective assessments of suicide risk by examining the electronic communications of people with a history of suicidal thoughts and behavior to identify communication patterns indicating increased suicide risk.

What is really going on: Suicide is the second leading cause of death in young adults, but the challenges of suicide prevention are great because the signs are often invisible. Research has shown that clinicians cannot reliably predict when a person is most at risk. Our project asks the following question: “Can big data techniques help us see what we humans cannot?” And more specifically, can personal electronic communications, such as telephone text messages, tell us about the risk of suicide? This project is a direct response to the urgent need for new data-driven tools to objectively assess acute suicide risk.

The role of big data: Our study focuses on building a multimodal data set, including clinical interview on mental health history, text messages, call history, email, social media data, and activity. web browsing of individuals with a history of suicidal thoughts and behavior. Big data techniques, such as natural language processing, machine learning and data visualization, will be applied to identify unique communication patterns that occur prior to a suicide attempt. These techniques can increase the visibility of subtle cues in communication, indicating when someone is in a suicidal state and allowing clinicians to more objectively assess when patients are particularly susceptible to injury.

Partisan Speech and Climate Change: A Toolkit for Detecting Deliberate Diction

Led by James Ascher, a Ph.D. candidate in English, and Bommae Kim, a Ph.D. candidate in psychology

Formal description of the project: Our project aims to understand the political discourse regarding climate change using natural language processing and machine learning of a body of edited texts. However, in the process, we uncovered larger issues regarding partisan rhetoric, repeatability, and credibility of knowledge. By understanding the language used in Congress that opposed the growing scientific consensus regarding climate change, we begin to understand something much bigger – we have traced the diction and rhetorical models used by representatives, experts and senators. to deny climate change, but these models turned out to be much older.

What is really going on: By the end of our project, we had developed a toolkit and tested a series of exercises with a first year writing course that not only demonstrated the particular partisan discourse for climate change, but also recreated the process. who developed this partisan language. Among other things, we developed a class technique of “paper numbering”, which used a class of thinkers in a manner parallel to formalized and computerized textual analysis. As we developed our techniques, we realized that the problem was access to knowledge and began to carefully document the tools to make them available to a smart student without supervision.

The role of big data: Our collaboration continues to develop this toolbox and generalize it. We can teach computers and freshmen how to spot climate change diction and language and explain how that language came to be through focus groups and numbering sessions. But we are working on documenting and packaging our tools so that the same techniques can be applied to any political controversy in an ethical and responsible manner. Our original work becomes an example and a case study for a broader method of analyzing partisan discourse that we plan to make available to any citizen-scientist who wishes to study how things are discussed.

Modeling and multi-agent analysis of large-scale brain networks with a large fMRI dataset

Led by Marlen Gonzalez, a Ph.D. candidate in psychology, Shize Su, a Ph.D. candidate in electrical and computer engineering, and Qiannan Yin, a doctorate. candidate in statistics

Formal description of the project: This project will analyze large-scale functional brain networks involved in the social regulation of emotions using both statistical methods and engineering tools. Using data from a social support functional neuroimaging study, we will model the brain as a dynamic network with nodes referring to different brain regions and lines representing interactions between each pair of brain areas.

What is really going on: The main objective is to identify and uncover some important models of the complex brain functional network that are valid for psychological interpretation, via sophisticated analysis of large fMRI (Functional Magnetic Resonance Imaging) data collected from psychological experiences. This will deepen the understanding of the complexity of brain networks for improving human brain health, add to the general literature on the physiological effects of social support, and derive new methods for learning more from large pre-existing data sets of FMRI.

The role of big data: A large fMRI dataset of over 100 participants was collected and the data reanalyzed, involving around 40 billion brain interactions, which is huge. This would then be multiplied by three experimental conditions and then by 100 subjects. Therefore, the amount of data exceeds “big data”. We are developing various new and efficient techniques to drastically reduce the computational load at the cost of some minor information loss. Without such techniques, the computation for extracting patterns from such a large brain network would take many years, even on a powerful computing cluster, or would otherwise be impractical.

Sean N. Ayres