Lessons from pandemic superstar data scientist Youyang Gu

“It has become clear that we are not going to achieve collective immunity in 2021, at least certainly not nationwide,” he said. “And I think it’s important, especially if you’re trying to instill confidence, that we make sane paths towards when we can get back to normal.” We should not tie this to an unrealistic goal like achieving collective immunity. I remain cautiously optimistic that my initial forecast in February for a return to normal in the summer will be valid. “

At the start of March, he fully packed his things – he figured he had done what he could. “I wanted to take a step back and let the other model builders and experts do their jobs,” he says. “I don’t want to blur the space.”

He always keeps an eye on the data, researching and analyzing the variants, vaccine deployment and wave four. “If I see something particularly disturbing or disturbing that I think people aren’t talking about, I will definitely post it,” he says. But for now, he is focusing on other projects, such as “YOLO Stocks”, a ticker analysis platform. His main work in the event of a pandemic is as a member of the World Health Organization’s Technical Advisory Group on Covid-19 Mortality Assessment, where he shares his overseas expertise.

“I really learned a lot last year,” says Gu. “It was very revealing.”

Lesson # 1: Focus on the basics

“From a data science perspective, my models have shown the importance of simplicity, which is often underestimated,” says Gu. Its model for predicting deaths was not only simple in design – the SEIR component with a machine learning layer – but also in its very clean, “bottom-up” approach to input data. Bottom-up means “start from the bare minimum and add complexity as needed,” he says. “My model only uses past deaths to predict future deaths. It does not use any other real data source.

Gu noted that other models relied on an eclectic variety of data on cases, hospitalizations, testing, mobility, mask use, comorbidities, age distribution, demographics, seasonality of pneumonia, annual pneumonia death rate, population density, air pollution, altitude, smoking data, reported auto-contacts, air passenger traffic, point of service, thermometers smartphones, Facebook posts, Google searches, etc.

“There is this belief that if you add more data to the model, or make it more sophisticated, then the model will perform better,” he says. “But in real life situations like the pandemic, where the data is so loud, you want to keep it as simple as possible. “

“I decided early on that past deaths were the best predictor of future deaths. It’s very simple: entry, exit. Adding more data sources will simply make it harder to extract the signal from the noise.

Lesson # 2: Minimize Assumptions

Gu feels he had an advantage in approaching the problem with a blank slate. “My goal was just to follow the covid data to learn more about the covid,” he says. “This is one of the main advantages from an outsider’s point of view.”

But not being an epidemiologist, Gu also had to make sure he wasn’t making incorrect or inaccurate assumptions. “My role is to design the model so that it can learn the assumptions for me,” he says.

“When new data goes against our beliefs, we sometimes tend to ignore it or ignore it, which can have repercussions down the road,” he notes. “I certainly found myself a victim of this, and I know a lot of other people have as well.”

“So being aware of and acknowledging the potential bias we have, and being able to adjust our priorities – adjust our beliefs if new data disproves them – is really important, especially in a fast-paced environment like what we are. have seen with covid. “

Lesson 3: Testing the hypothesis

“What I’ve seen over the past few months is that anyone can make claims or manipulate data to make it fit the narrative of what they want to believe in,” Gu said. This highlights the importance of simply making testable assumptions.

“For me, this is the whole basis of my projections and forecasts. I have a set of assumptions, and if those assumptions are true, then this is what we predict will happen in the future, ”he says. “And if the assumptions end up being wrong, then of course we have to admit that the assumptions we are making are not true and adjust accordingly. If you don’t make verifiable assumptions then there is no way to show whether you are really right or wrong.

Lesson 4: Learn from mistakes

“Not all of the projections I made were correct,” Gu said. In May 2020, he predicted 180,000 deaths in the United States in early August. “It’s a lot more than what we’ve seen,” he recalls (there were around 155,000 deaths). His testable hypothesis turned out to be incorrect – “and that forced me to adjust my assumptions.”

At the time, Gu used a fixed infection death rate of around 1% as a constant in the SEIR simulator. When in the summer he lowered the infection death rate to around 0.4% (and later to around 0.7%), his projections reverted to a more realistic range.

Lesson 5: Engage the critics

“Not everyone will agree with my ideas, and I welcome that,” said Gu, who has used Twitter to post his projections and analysis. “I try to respond to people as much as I can, defend my position and debate with people. It forces you to think about your assumptions and why you think they are correct.

“It goes back to confirmation bias,” he says. “If I’m not in a position to defend my position properly, then is that really the right claim and should I be making those claims?” It helps me understand, by engaging with other people, how to think about these issues. When other people present evidence that contradicts my positions, I need to be able to recognize when I may be incorrect in some of my assumptions. And it actually helped me a lot to improve my model.

Lesson 6: Show healthy skepticism

“I’m a lot more skeptical of science now, and that’s not a bad thing,” Gu says. “I think it’s important to always question results, but in a healthy way. It’s a fine line. Because a lot of people reject science outright, and that’s not the right way to go either. “

“But I think it’s also important not to blindly trust science,” he continues. “Scientists are not perfect. If something is wrong, he said, ask questions and find explanations. “It’s important to have different points of view. If there’s anything we’ve learned over the past year, it’s that no one is always 100% right.

“I can’t speak for all scientists, but my job is to cut through all the noise and find out the truth,” he says. “I’m not saying I’ve been perfect in the past year. I was wrong several times. But I think we can all learn to approach science as a method of finding the truth, rather than the truth itself. “

Sean N. Ayres