What does a data scientist do? We spoke to one of them to find out more about this popular and lucrative domain.

Image: Metamorworks, Getty Images/iStockphoto

Data scientists process and interpret what are typically huge amounts of information to help provide insights across a wide variety of fields and disciplines, including marketing, social media, finance, sales, and healthcare health.

SEE: Building an Effective Data Science Team: A Guide for Business and Technology Leaders (Free PDF) (TechRepublic)

Data science is a growing and lucrative field with a lot of potential. In fact, Glassdoor ranked data scientists as having the best job in America for 2019 based on earning potential, job satisfaction, and number of openings. In fact, the average data scientist salary is around $91,000 in the United States.

A career in data science does not happen by chance; the field attracts certain candidates with specific skills and analysis-oriented backgrounds.

I spoke with one such data scientist, Sri Megha Vujjini, who works at Saggezza, a global managed services provider and technology consulting firm. She started her career at Deloitte for a year, then returned to school for her master’s degree in data science. Initially interested in telecommunications engineering, she turned to data science after building algorithms for robotics.

Scott Matteson: You said robotics sparked your interest in a career in data science. Can you tell us more about your work with algorithms for robotics and how it inspired you to get into data science?

Sri Megha Vujjini: One of the first things I did when I started working with robots was to automate the direction of a robot. You could say I was building a self-driving car, but a smaller, less risky version. The concept behind it was always the same – it must move if it’s safe and it must stop if it’s not – pretty much a black or white situation. It gets more complicated when you add more features to it, for example, which direction should it go? Can it go straight instead of stopping? In which circumstances? All of these scenarios push you to think outside the box because all possibilities and probabilities can affect the output.

As we expand the scale and apply it to a business case, we have a data science problem. For me, it was a bit like solving a puzzle, often asking, “Why does this happen and how does it work?” then replicate that in lines of code and optimize that code – that’s what led me to this area.

Scott Matteson: Can you give examples of how you have focused on data mining, statistical modeling, pattern recognition, and visualization methods throughout your career (or in your work today)?

Sri Megha Vujjini: A simple example would be creating budgets for a business, regardless of industry. A budget is usually planned around the activities of the coming year, but it is possible to use the history in a statistical way.

I had the opportunity to solve a piece of a puzzle in this regard. I work with the retail industry and was able to create a time series model around sales, promotions and external economic factors that would essentially predict sales for the next few years. Using this as a baseline, a multitude of decisions and operations took place. It was necessary to recognize the trends (more sales in March and not only in November because of the holidays), visualize them to better explain them to the company, then automate the whole solution to use it as needed.

In short, this career is about understanding the business, understanding its problems and pain points, and delivering a solution using data as the backbone.

Scott Matteson: What makes data science unique? What kind of personality or character works best with him? What are the challenges?

Sri Megha Vujjini: Ironically, one unique thing about this domain is that it has no particular definition. It is a broad field with varying definitions across industry and academia. That’s because it’s a mix of math, statistics, computer science, analytics, artificial intelligence, and business. Data science is the elevated version of the whole combination of all these fields.

Not wanting to discourage anyone, certain traits and characteristics would make it easier to work in this area: problem solving, be it math, probability or even puzzles, always thinking about the big picture, thinking outside the box and being organized sometimes helps. . Data science sometimes presents chaotic problems, and the first step to solving them is usually to break them down and organize them into a waterfall structure.

The only challenge, and I hope everyone in this field will agree with me on this one, is: data. Data is never perfect, it is incomplete or does not meet your needs. It may be small which would give you no information or it may be too large for you to refine the solution. It’s still the data, but once we understand how to use it and how it works, we can use it in the best way to get all the information we want from it.

Scott Matteson: What are some of the problems solved by data science?

Sri Megha Vujjini: Not world peace, not yet at least. But within the industry, we have now improved customer experience and recommender systems, delivered faster, and created smoother and improved business operations in enterprises through some of the solutions provided by data science. . If we look at Amazon’s growth as an online retailer, we can identify some of the improvements and tie them to the points I mentioned above.

But outside of the company, day by day, we are constantly improving Google/Apple Maps, doing cutting-edge research in medicine, physics, space, or even self-driving cars. All of these problems and subsets of these problems have been solved by data science.

Scott Matteson: What are the technological products or tools used for this field?

Sri Megha Vujjini: There is a tiny proportion of jobs that don’t require programming skills that are reserved for industry veterans. Otherwise, it is always good to know Python, R and SQL because they make life easier. From a mathematical/statistical point of view, we can use SAS, MATLAB, Python, R and all the rich libraries they all offer. And since so much data is transferred to the cloud, it would be helpful to know and understand cloud technologies. We have Azure, AWS, Google Cloud and Snowflake, all used in varying capacities across the industry. In some cases, visualizations are also important, and they can be done using Python and R. We can always go beyond that and use tools like PowerBI or Tableau.

Sean N. Ayres