Citizen data scientist training for augmented analytics

Even as companies invest heavily in digital transformations and become more data-driven, there is a dramatic shortage of data scientists.

According to QuantHub, there was a data scientist shortage of 250,000 people in 2020. In a report released earlier this month by technology career center Dice, the data scientist was one of five positions at the fastest growing this year, while LinkedIn’s 2021 jobs report found hires for data scientist positions increased nearly 46% since 2019.

Augmented analytics bridges the gap

Augmented analytics tools hold the promise of filling some of these gaps by making the technology more accessible to non-data scientists.

For example, analytics is increasingly being integrated into apps that employees already use, such as Salesforce.

Additionally, there has been an increase in the availability of low-code or no-code platforms such as, Knime, SparkBeyond, DataRobot, Rapidminer, Alteryx, SAS Viya and many more, Amaresh Tripathy said. , world leader in analysis at Genpact. .

“These platforms can automate the standard steps involved in traditional end-to-end data science projects,” he said. “However, there are two main areas where ‘humans in the loop’ are needed.”

Humans are needed to understand data in the context of a domain-specific application, he said, and to translate the information into something that can be used to make business decisions faster.

“These areas are where citizen data scientists play a central role,” he said.

But without the proper controls and training for these citizen data scientists in place, things can easily go wrong.

Take, for example, the question of correlation versus causation: the alarm clock rings and the sun rises. Without an understanding of the underlying data set and domain expertise to know that the sun is going to rise whether or not the alarm goes off, someone might conclude that one is causing the other. . And so, if you change the time the clock is set, you can make the sun rise earlier or later.

Amaresh Tripathy

“Even expert data scientists make these mistakes all the time,” Tripathy said. “But someone who isn’t so imbued with it is likely to make mistakes more often. If you don’t understand the concept of causation, it could lead to things that are poorly correlated and lead to bad business strategies.”

There are other areas where analysis can go wrong. If a company has traditionally only hired white men for technical positions, for example, a resume selection algorithm can downgrade equally good female or minority resumes.

Another example is a scoring algorithm for loan applications that can show preference based on race based on historical trends. The easy way out, to remove race from the dataset, can result in a proxy variable such as zip codes that has the same effect.

Either way, the company would get into trouble with regulators and end up with fines or public relations disasters. Depending on how they use analytics, citizen data scientists may need training on basic concepts, or on privacy, security, or compliance issues.

Key data science skills for citizen data scientists

Before using the tools successfully, data scientists must understand which datasets are relevant to the problem they are addressing, the current trends and patterns relevant to that problem, and how to translate the information they get from the platform. -form analysis of data into usable information.

This may require additional training, Tripathy said.

Companies can deliver this training through sessions led by platform developers, webinars and hands-on training. To get the maximum impact from this training, it should be based on data sets representative of the real challenges these companies face, he added.

Genpact, a business transformation consultancy that emerged from GE in 2005, does just that.

To date, about 70,000 of its nearly 100,000 employees have completed some degree of data literacy training, he said.

The in-house training program offers small, personalized learning paths in over 70 different skills. Additionally, employees are encouraged to enroll in a machine language incubator program where they receive training in data science, augmented intelligence, and visual storytelling platforms.

The program was established two years ago and approximately 30,000 people from all walks of life have completed the full program.

“There are people who are supply chain planners, claims processors, call center operators, risk management professionals, marketers,” Tripathy said. “The program is designed for everyone. And the more diverse the background, the more interesting the ideas of how people are going to apply it.”

One of the benefits of Genpact data science training is lower attrition rates.

“We have higher engagement, we develop skill sets, so we have higher retention,” he said.

There are also business advantages to being able to upgrade the skills of existing employees when customers demand new skills rather than trying to find people to hire.

Finally, employees with good analytical skills can better serve customers.

“You are sharing more interesting information with customers, which increases the value of the service we provide,” he said.

How long does it take to create a citizen data scientist

In the first year that Genpact implemented its program, the company focused on introducing people to basic data science concepts.

The second year was about solidifying the content and applying it.

“And now it’s a matter of ‘are they connecting it to real projects and changing the work they’re doing?’” Tripathy said.

But companies shouldn’t focus on the length of particular training programs, Tripathy said.

“You have to have a culture of learning,” he said. “It’s not a matter of time. Yes, some of the courses we have are micro-learning courses and in a week or 10 days you are going to make a lot of progress. But the real question is immersion, and how you connect it to your daily work. “

Foolproof augmented analyzes

Depending on the context, some augmented analysis tools may not require any training.

For example, a tool that is integrated into an employee’s workflow and that operates under very narrow presets can be so easy to use that employees can just start using it.

“The idea is to expand the democratization of self-service using IT support,” said Doug Henschen, vice president and senior analyst at Constellation Research. “Only some of these features require training, and I wouldn’t call it extensive training. “

Head of Doug HenschenDoug henschen

Intuitive add-ons to business intelligence and self-service analytics products can sometimes be mastered by experimentation, he said, or by reading the documentation and help menus, or through advice from analysts and experienced users.

“In many cases, vendors offer video tutorials and online training courses,” he added.

These might be appropriate for more sophisticated tools, like those that prepare data or are used for forecasting, he said.

Basic skills for a citizen data scientist training program

Anand Rao, partner and global leader in AI at PricewaterhouseCoopers, recommends that companies consider three levels of training in citizen data science.

The first is improving digital skills. This is high-level training on different types of digital assets and how they relate to each other, he said, and includes data, analytics, automation and AI.

Head of Anand RaoAnand Rao

PricewaterhouseCoopers began this journey more than three years ago, Rao said, in response to market trends, customer demands and employees wanting new skills in order to remain competitive themselves.

The next level is business analysis, where a business expert or subject matter expert receives training on the types of business problems that can be solved with analytics and on relevant data science solutions.

Finally, citizen data scientists need data storytelling skills, he said. Depending on the level of education and previous experience, it takes three to six months to train a data scientist at entry level, he said, and six to 12 months to train one at an intermediate or intermediate level. advanced.

“Citizen data scientists should learn to interpret the results of the different algorithms they will use in the platform,” said Rao. “In addition, they should also learn how to tell a story using data, highlight ideas and at the same time explain evidence from data.”

Sean N. Ayres