How to empower the data scientist in the age of advanced computing and AI

Dan Warner, CEO and Co-Founder of LGN, Explains How Data Scientists Can Be Empowered in the Age of Edge Computing and AI

With ever-changing data, the scientists who manage this can’t do it alone.

One of the most publicized roles in technology and, indeed, business has been for some time now. It’s not hard to see why – as businesses realize the seemingly limitless potential of their data, they have realized they need people who can extract, analyze and interpret vast amounts of data. data. The demand is such that there is constant talk of a shortage of data scientists, especially in more experienced management positions.

Yet despite all this attention, how effective are these data scientists and how much do they really feel empowered? This is a relevant question, coming at a time when so much data is underutilized. Do companies, knowing that they need to make better use of their data, hire data scientists without fully understanding how to best deploy talent?

Maybe a better way to look at it is to ask if companies know how to best use their data – do they hire data scientists and expect them to work wonders, or do they? Are companies not only making sure they have the right talent, but that they are feeding these teams the right data?

How to start a career in data science

To kick off our Data Science Month, this article will explore how you can start a career in data science and the key factors to consider. Read here

Garbage in, garbage outside

Many might think it’s the data scientist’s job to find the right data, but they are wrong. At the end of the day, data scientists can only work with what they’re given, in the same way that a salesperson can’t do much with a mediocre product, or a Formula 1 driver can’t. can’t do much with an average car.

What then is the right data? Obviously this varies from company to company, but basically there are a number of principles that good data will follow, regardless of the organization’s needs. First, it must be fresh – that means it must reflect the real world as it is at that time. Everything changes so quickly that a lot of data quickly becomes useless. The more it stagnates, the less valuable it is.

Thus, if a data scientist is working on old data when more recent information is available, the information he can extract will be less relevant to the environment in which the company operates.

Second, it must be live data – so it must come from the real world, not training data and not invented. Why? Because the real world is messy, creating anomalies that no one would have ever thought of, creating obstacles that models and, indeed, data scientists set up on sanitized training data will not be able to deal with. .

In other words, if an organization is feeding its data scientists and their outdated, offline data models, then the best the business can hope for is limited and irrelevant information.

Why the edge is the next frontier for data scientists

This means that companies must find a way to continuously provide their data scientists with live, real-time, scalable data from the real world. How do they do this? With advanced computing.

Edge computing needs no introduction – with the explosion of Internet of Things devices in recent years, more and more data processing is taking place at the edge of networks. Sensors on everything from wind turbines and tractors to refrigerators and streetlights are capturing data all the time. It’s real, it’s live, it’s messy, and that’s exactly what data scientists need to work on.

Organizations need to empower their data scientists by providing them with training data and performance metrics at the edge. They can then use it to inform their AI models, which in turn are then deployed to edge devices. These real-world environments provide data scientists with vital information on how their models resist anomalies and variations that cannot be recreated in labs or test environments. Models could very well perform poorly, at least initially – that’s a good thing, because it gives data scientists something to dig into, to figure out what is going on that they hadn’t thought of.

That said, whether the models are performing well or poorly, the data should be accessed, cleaned up, annotated and ultimately fed back into the model for training on an ongoing basis. It is a feedback loop that must continue to function so that the systems can improve and adapt. But it has to be smart data mining – no system can handle all of the data collected by the sensors, so having a way to identify and retrieve the most important data from it is essential. from the outskirts.

On top of that, data scientists need to be able to redeploy sensors and machines to investigate, re-image and analyze data sources, confusing AI models. No matter how the data was collected, however automated the process was, at some point it was subjected to human thought, assumption and assumption. These may have been based on the data and evidence available at the time, but it may no longer be appropriate to enter the necessary data. This is where being able to modify the data collected is essential for data scientists to remain efficient and work on the most relevant information.

Train machine learning models to be ready for the future

As digital innovation continues to accelerate, we explore how machine learning models can be trained to be ready for the future. Read here

A new active learning paradigm

Ultimately, all of this signals a shift from the old paradigm of collecting large training datasets, segmenting, training the model and seeing what happens, and towards a new paradigm – that of the active learning, where AI models learn to cope with the real world, and data scientists are empowered to work effectively. In doing so, they will be better equipped to gather the information and intelligence necessary to give their organizations a true competitive advantage in increasingly crowded and data-driven markets.

Written by Dan Warner, CEO and Co-Founder of LGN

Sean N. Ayres