Resources and advice on what to learn

Data science is a complex field that requires its practitioners to think strategically. On a daily basis, this requires aspects of database administration and data analysis, as well as expertise in statistical modeling (and even machine learning algorithms). It also takes, as you might expect, a lot of training before you can embark on a career as a data scientist.

There are a variety of training options for data scientists at all stages of their career, from those just starting out to those looking to master the more advanced tools. Here are some platforms and training tips for all data scientists.

Are you just starting out? Consider these resources.

Kevin Young, senior data and analytics consultant at SPR, says many data scientists view Kaggle as a go-to learning resource. Kaggle is a Google-owned machine learning competition platform with a series of user-friendly courses to help beginners start their data science journey.

Topics covered range from Python to deep learning and more. “Once a beginner has a basic understanding of data science, they can engage in machine learning competitions in a collaborative community where people are willing to share their work with the community,” says Young.

In addition to Kaggle, there are many other online resources that data scientists (or aspiring data scientists) can use to deepen their knowledge in the field. Here are some free resources:

And here are a few that will cost (although you’ll get certification or similar proof of completion upon completion):

This is only part of what exists, of course. Fortunately, the online education ecosystem for data science is large enough to accommodate all kinds of learning styles.

Become familiar with data structures, analysis

Seth Robinson, vice president of industry research at CompTIA, explains that people starting a career in data science will need to be familiar with data structures, database administration, and data analysis.

Database administration is the most established role in the field of data, and there are many resources teaching the basics of data management, the use of SQL to manipulate databases, and techniques for managing databases. ensure data quality. “Beyond traditional database administration, an individual might discover new techniques involving non-relational databases and unstructured data,” he adds.

Data analytics training is newer, but resources like CompTIA’s Data+ certification can add skills in data mining, visualization, and data governance. “From there, specific training around data science is even rarer, but resources exist to teach or certify advanced skills in statistical modeling or strategic data architecture,” Robinson says.

Two data science training groups

Young cites two main segments of data science training: model building and model implementation.

Model building training is the most academic application of statistical models on an engineering data set to create a predictive model: it is the training that most introductory science courses would cover. Datas.

“This training provides the fundamental foundation for creating models that will provide predictive results,” he says. “Modeling training is typically taught in Python and covers engineering the dataset, creating a model, and evaluating that model.”

Model implementation training opportunities cover the post-model creation stage, which is putting the model into production. This training is often vendor or cloud specific to allow the model to make predictions on incoming live data. “This type of training would be through cloud providers such as AWS offering in-person or virtual training on their machine learning services such as Sagemaker,” says Young.

These cloud services provide the ability to take machine learning models produced on data scientists’ laptops and persist the model in the cloud, enabling ongoing analysis. “This type of training is vital because time and human capital are typically much more important in the model implementation phase than in the model creation phase,” says Young.

This is because when models are created, they often use a smaller, cleaned dataset from which a single data scientist can create a model. When this model is implemented in production engineering teams, DevOps engineers and/or cloud engineers are often needed to build the underlying compute resources and automation around the solution.

“The more trained the data scientist is in these areas, the more likely the project is to succeed,” he says.

Distance training gains traction

Young says one of the lessons learned during the pandemic is that professionals in tech roles can be productive remotely. “It blurs the lines a bit on how boot camps differ from online classes, as many boot camps have moved to a remote model,” he says. “It emphasizes the ability to ask a subject matter expert questions, whether you’re in a boot camp or an online class.”

He adds that certifications can improve organizations’ standing with software and cloud vendors. “That means job candidates move up the resume stack if they have certifications that the company values,” Young says.

For aspiring data scientists choosing between boot camps and online courses, he says the most important aspect to compare the two is probably the career resources offered. “A strong boot camp should have a dedicated resource to help graduates find jobs after boot camp,” he says.

A lifetime of learning—paid for by the organization

Robinson adds that it’s important to note that data science is a relatively advanced field.

“Not all tech jobs are created equal,” he explains. “Someone considering a career in data science should recognize that the learning path is likely to be more complex than it would be for a role such as network administration or software development.”

Young agrees, adding that data scientists need to work in a collaborative environment with other data scientists and subject matter experts reviewing their work. “Data science is a growing field,” he says. “Although the fundamental techniques do not change, the way these techniques are implemented changes as new libraries are written and integrated with the underlying software on which the models are built.”

From his perspective, a good data scientist is always learning, and any well-positioned company should offer reimbursement for credible training resources.

Robinson notes that internal resources vary from employer to employer, but points to a macro trend of organizations recognizing that workforce training needs to be a higher priority. “With so many organizations competing for so few resources, companies are finding that direct training or indirect skill-building assistance can be a more reliable option for developing the exact skills needed, while improving the experience of employees. employees in a tight labor market,” he said. .

Sean N. Ayres