Data scientist: education, training, maintenance
Your typical data scientist works with various forms of data to uncover insights and knowledge. Then, they develop products and services that support optimal decision-making.
Data can be structured (coming from a predefined data model and residing in relational databases) or unstructured (having no predefined format, such as text files or user-generated content).
A data scientist is responsible for understanding and aggregating these different sets of data, and using statistical and machine learning techniques to create analytics and predictive models. They work with data and application engineers to integrate these models into the product, improving user experience and engagement with the product. They also help identify opportunities to improve organizational efficiency and increase business value.
Data scientists often interact with people from multiple departments, such as business development, sales, product management, project management, UX/UI designs, and software engineering teams.
“Data scientists can continue to develop their professional career as an individual contributor or follow a managerial path in data science,” said Seongjoon Koo, Chief Data Officer at JD Power. “Additionally, there is an opportunity to move into a product manager role by managing data science products and services.”
Vibha Srinivasan, director of data science at Spiceworks, explained that the career path of data scientists is actually similar to that of a software developer.
“At entry level, you have well-defined problems to work on, for example, building recommendation engines to drive product purchases,” she said. “As you move into senior and lead data scientist roles, you’ll need to look at business goals and see how data science can be used most effectively to help achieve those goals.”
This involves evaluating different approaches and making trade-offs between accuracy and speed of deployment.
“You would take the initiative to evaluate third-party data sources and external APIs for machine learning to see if they would add business value or help you deliver your product faster,” Srinivasan said. “You will also mentor and train junior data scientists within your team.”
Regardless of business use cases and career level, day-to-day work will involve a lot of cleaning, analysis, feature extraction, modeling, and data visualization tasks.
“You will also spend time reading and keeping up to date with industry trends, as this is a rapidly growing field,” Srinivasan noted.
Typical Data Scientist Job Posting
Srinivasan said tech professionals should look for job descriptions that clearly outline job responsibilities, as they can vary widely from company to company.
“The job posting should also detail the teams and departments the data scientist will be collaborating with, and some examples of the products they will focus on at the company,” Srinivasan said.
However, in companies that are just starting to build a data science team, the responsibilities part may be intentionally vague, as you will need to help assess how data science can help the business.
Education and formal training in data science, analytics, statistics, computer science, and electrical engineering, or closely related technical disciplines, are often preferred. Massive Open Online Courses (MOOCs) can help people from different backgrounds gain the necessary education and experience.
Koo said practical coding skills and experience in Python, R, and/or other programming languages are necessary for data scientists. The ability to quickly understand data and interpret the results for the business is also essential. Due to the collaborative nature of the work, good communication skills are preferred. Srinivasan agreed that a strong background in math and statistics is essential, along with good programming skills.
Experience with a range of data mining and machine learning techniques, such as classification, clustering, natural language processing, neural networks, etc. is highly desirable.
“Good SQL skills go a long way in extracting and analyzing structured data,” Srinivasan said. “A knowledge of basic statistics is necessary to evaluate your datasets and make reasonable assumptions.”
These skills can often be acquired through a bachelor’s degree (or higher) in math, statistics, computer science, or related degree, and through experience in the field.
“There are also several machine learning bootcamps and online courses,” Srinivasan said. “Participating in Kaggle data science competitions is also a great way to hone your skills.”
Typical interview of a Data Scientist
Typically, interview questions focus on the following:
Ideally, the questions will be designed to reflect the nature of the work you will be doing in the business and the types of data you will be dealing with.
“For example, you might receive a file containing fictitious data about traffic to different landing pages on your website and ask you to create a model that predicts conversion rates,” Srinivasan said. “More than the solution itself, investigators are looking to see if you ask clarifying questions about the data, state the assumptions you make, and explain your thought process as you solve the problem.”
Candidates will be asked to explain why they chose a particular approach and its advantages and disadvantages compared to other techniques. Some investigators may ask potential recruits to explain the math behind machine learning, such as L1 versus L2 regularization, or concepts such as cross-validation.
Since labeled data is often a luxury, you might be asked how to build a predictive model in the absence of labeled data (using unsupervised ML techniques or keyword-based approaches to generate labels ).
“When it comes to statistics, issues around Bayes’ theorem and conditional probabilities are talk favorites,” Srinivasan added. “As mentioned earlier, it’s important to clearly communicate your approach to technical (data scientists) and non-technical (product managers) alike.”
Koo also noted that hands-on coding exercises, with real data and interpretation of results, are gaining popularity as a way to test candidates’ true abilities. A deep understanding of algorithms, instead of just familiarity with certain machine learning libraries and packages, is often preferred.
What to include on a resume/Cover letter
In addition to highlighting individual skills and experience, candidates should amplify their skills with various tools and libraries used by data scientists, such as natural language processing libraries (including Gensim and Spacy), deep learning (such as TensorFlow, Keras, Pytorch), Big Data technologies (Hadoop and Spark) and analytics tools such as SQL.
As Srinivasan noted, it’s also important to include any personal projects you’ve worked on and data science competitions you’ve entered. Experienced candidates should elaborate on their current and past analytics and machine learning projects, as well as the business value their work has delivered. .
If you were evaluating additional data sources or alternative approaches that streamlined processes at your previous workplaces, this would be something to highlight. And don’t forget: every bullet point in the “Experience” section of your resume should mention the positive impact of your actions (e.g. “Increased unit revenue by 25% after using data to streamline production process “), because above all, potential employers want to see how you can change an organization for the better.