What is the difference between a data engineer, data analyst and data scientist?

If you want to make sure you don’t lose your job in finance (or anywhere else) in the next five years, you probably want to work in big data. But what do Big Data jobs involve?

Speaking at last week’s Women of Silicon Roundabout in London, Dr Rebecca Pope, head of data science and engineering at KPMG, said there is no need to be an excellent statistician or a high-level mathematician to work in Big Data. You also don’t need a lot of prior programming knowledge.

However, you To do need an interest in statistics, you To do you must be willing to learn to code, and you must know how to perform some high-level mathematical operations.

Pope herself did not study pure statistics (she is a neuroscientist). She didn’t study programming either. Instead, she learned to program after graduating and attended “endless hackathons.”

I started learning R. But my advice would be that if you’re starting a career in data science, you should major in Python — make Python the first language you learn,” Pope said.

Data scientists aren’t just statisticians, Pope said. “A A statistician is interested in building a model that establishes a relationship between a variable and an outcome.” A data scientist wants to do something more: predict. Data Scientists train models on data so that the models can predict the future as accurately as possible.

Big Data work is done in stages. Commercial use must be established and the raw data must be fit for purpose (so-called “data wrangling”), then the algorithms that analyze the data are written and tested on the available data, and – s are machine learning algorithms – they learn from data and predict the future. Visualizations and APIs must be created for the business to interact with the resulting product.

Different types of data professionals are engaged at different stages. Or, you can be a generalist data scientist operating across the spectrum.

What does a data engineer do?

Pope has compiled the following chart showing the skills data engineers need and the tasks they perform. Basically, it’s a lot of software engineering and data preparation.

The job of the data engineer is “the representation and movement of data so that it is consumable and usable,” Pope said. If you’re a data engineer, you need to take the raw data, clean it, move it into a database, label it, and generally make sure it’s ready for the next step in the process…

Pope said the programming languages ​​and platforms you will need for data engineering jobs are: Apache Spark, Scala, Docker, Java, Hadoop, and Kubernetes NiFI.


What does a data analyst do?

After the data engineer, comes the data analyst. The table below shows where data analysts operate. It’s about interfacing with the business to know what is expected from the data and developing visualizations that allow the business to easily interpret what the data is saying.

The job of the data analyst is “on interpreting current information to make it useful to the business,” Pope said. There is not a lot of machine learning modeling or machine learning deployment in the data analyst role.

If you want to be a data analyst, Pope said it would help if you understood how to use Predictive analysis software RapidMiner and Postgresql, an open source relational database.

What does a data scientist do?

Finally, there is the “pure data scientist”. This is what most people imagine they will do if they work with data. Data scientists interact heavily with the business and work with data engineers. They train machine learning programs on specially prepared data to provide easy-to-use visualizations that meet business needs.

The role of the data scientist is to create models that can extrapolate from the data and make business-relevant suggestions, Pope said.

Data scientists need to understand statistics, but Pope said most machine learning algorithms are based on multivariate calculus and linear and nonlinear algebra. “That’s the level of math you need to know,” she added.

You’ll also need good data visualization and people skills so you can present your model and its results to the business – and encourage them to use it.

Find a job in big data

Pope recruits at KPMG. And it’s not just looking for high-performing doctoral and master’s students. Being a good data scientist means being the “Swiss army knife” that can operate across the spectrum of data engineers, data analysts and data scientists, she said.

When Pope recruits at KPMG, she says she’s “blind” to the qualifications applicants have earned: what matters most is their performance against the technical challenge set by the company. “I’m much more interested in what technology you can create and what you can drive for our customer base. [than qualifications]”, Pope said.

To that end, she suggested that instead of studying an expensive master’s degree or higher qualification, you pursue internships and work experience and compete on platforms like Kaggle.

“It’s not about being a deep technical expert in Scala or Python. It’s about figuring out what you need to answer the questions posed by the business,” Pope concluded.

Have a confidential story, tip or comment you’d like to share? Contact: [email protected] in the first place. Whatsapp/Signal/Telegram also available.

Be patient if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans may be asleep or away from their desks, so your comment may take a while to appear. Eventually, it will – unless it’s offensive or defamatory (in which case it won’t.)

Sean N. Ayres