5 questions to Katherine Lin: Data Scientist in training
This story stands at the intersection of challenge and opportunity. First, the pandemic. In times of crisis, it was also a year of revealing figures. Take, for example, all the data generated from this very unique moment in the history of our world that can help us reflect on various outcomes and better prepare for comparable crises in the future.
This data, in part, helped Katherine Lin turn another COVID-related disappointment into an opportunity. Lin, currently a senior at Byram Hills High School in New York City, was preparing last spring to apply to the Wharton Data Science Academy, a summer program that features machine learning and data science tools for high school students. When this program was canceled due to the pandemic, Katherine reached out to program manager Linda Zhao, professor of statistics at Wharton, to explore possible mentoring opportunities.
This awareness inspired an immersive data science experience for Katherine who began by studying statistical machine learning models and the R programming language through Zhao’s “Modern Data Mining” online courses, then moved on to work. virtually alongside Zhao to conduct comprehensive data research into the death of COVID-19. and its impact on counties with different socio-economic characteristics, and finally presented its findings in February 2021 at the Women in Data Science @ Penn remote conference. (See the Related Links tab for more information on Wharton’s analytics and statistics).
The theme of this year’s conference – This is what a data scientist looks like – underlined the the depth, breadth and diversity of data science, including a particularly well-documented student at Byram Hills High School.
Wharton Global Youth sat down with Katherine to learn more about her data discoveries. “What I take away the most from all this experience is that I want to embark on a career where I can do data science research just because it has been a very rewarding experience, ”said Katherine, who is heads to MIT in the fall. “I was able to get results that meant something and were really relevant. I want to continue this. I want to be able to help people while pursuing my passion for computing and data science.
Curious about her research project and her academic collaboration, we asked Katherine for all the details. We ask you 5 questions to Katherine Lin:
Wharton World Youth: What did you know about data science (a field that uses scientific methods, algorithms, and more to extract knowledge and ideas from structured and unstructured data) when you first contacted Professor Zhao time ?
Catherine: I took AP Computer Science my second year and now I am a teaching assistant in this class. I have python [programming language] and the experience of probabilities. I had to learn a lot of things, so Professor Zhao sent me his lectures, which helped me a lot. Classes were run with machine learning and R so that I could learn both at the same time. There were examples with R code and examples with real data sets, where I could see the different machine learning sets in action. It helped me to fully understand how each of the machine learning methods works. We also had short Zoom meetings for me to ask him questions. It took a few months.
Wharton World Youth: How did your research process involve?
Catherine: After I finished learning, I was really excited to start and begin the analysis. I learned that it takes a lot of preparation first. I spent a lot of time discussing and cleaning up the data, but once we felt ready to take the next step, Professor Zhao helped me through each of the machine learning methods, including writing the code, executing it, finding the results. It was my favorite part, being able to see the results. Finally came the writing. That was definitely the hardest part for me – putting everything we had into a cohesive report and finding new ways to display our data. I also got the most advice from Professor Zhao at this point. She gave me a lot of advice and help on how to format it and write everything down.
Wharton World Youth: What were some of your main research findings presented in your report, titled “Impact of COVID-19 on Counties with Different Socio-Economic Characteristics?” “
Catherine: We tried to find the important factors affecting the COVID-19 death rate – for example, is a racial group more affected? And do income level and education level play an important role? There have been a lot of media reports about how certain groups have been disproportionately affected [by the pandemic]. I wasn’t sure they were completely reliable. After seeing this data, it’s definitely true. Some groups need more support and more resources should be allocated to help these groups, especially during this pandemic, but also generally in times of crisis. Directing more resources to these groups could help the United States as a whole. (For more details on the report, watch Katherine’s Women in data science presentation, as well as research from other students, in the video at the end of this article).
Wharton World Youth: Do you remember a time during your research when everything fell into place for you?
Catherine: I had just finished some type of machine learning method and was going to a random forest and it was yielding really good results. I was able to separate the different variables and see what was happening. I had that moment where I was like, ‘Oh my God I can see what is affecting the spread of COVID-19 and I can see everything that was hidden before and now it’s out in the open! “
Wharton World Youth: What would you like other high school students to understand about data analysis?
Catherine: I wouldn’t say my research was the most technically complex, but having done it and having had this experience was the most important thing. Data is everywhere. With a solid foundation of analytical thinking and an interest in problem solving, I would get started right away. Email potential mentors or approach summer programs or take a more exploratory approach by examining datasets on Kaggle. You don’t necessarily have to analyze them using all of these complicated techniques, but you can gain a basic understanding of how data works so that after high school you can go deeper and study in college.
Katherine Lin seized an opportunity during the pandemic. Describe an opportunity that you have seized in the past year, related to research or otherwise. Share your experiences in the comments section of this article.
How are data and decision-making connected, and why is this particularly powerful in times of unexpected crisis, such as the pandemic?
After completing this article, explore the resources on the Women in Data Science @ Penn conference website, which you can find linked within the article, as well as the Related Links tab. Review another presentation from the conference and share what you learned about data science with your classmates.