5 Most Important Tips Every Data Analyst Should Know

Profile picture of Castor Hacker Noon

Beaver

Castor (http://castordoc.com) – Bring confidence and visibility to your data

# 1 If your analysis is unbiased, then take another look

Problem definition

Bias is an inclination for or against an idea. Most of the time this is totally unconscious, it mainly happens when our results are exactly what we expect of them. We’re all human beings, if we have expectations about something, and after digging into the data a bit, our first results are in line with our expectations, then we tend to stop there. When our results don’t match what we expect of them, we can keep digging until there is.

How to avoid this?

Think about what could be skewing the results of your analysis. I see two main drivers of such a bias.

The scope of your analysis

Try changing the focus of the date range or even the data used may give you different results. The classic challenges deal with seasonality and mix effects. Beware of cohort effects

The methodology of your analysis

This one flirts with 101 stats, now that you have the right span of time and data points, think carefully about how you aggregate them to get results. Outliers should be taken into account, as should the aggregation metric. Always check the mean against the median.

# 2 Most early drafts can be done in Excel

This title is a bit provocative. Yes, python is powerful and allows you to save and repeat your data processing. But there is the cost to it. Firstly, it takes time, especially if you are not a python enthusiast. Second, collaboration is more difficult with non-tech users. If you need people who don’t know the code to work with you on your data app, then python will slow them down.

As a data reader, you’ll want to do projects in Python, just to ramp up. But choose them carefully. If you have a very tight schedule and Excel gets the job done, go with Excel. You can migrate to python later as it’s always easier to learn one thing at a time. It’s hard to build a whole new data app with a language you’re not comfortable with. First do the analysis with a tool you know well, then migrate it to the new language.

# 3 Get a tool that keeps your query history

Have you ever received a request for data similar to the one you had 3 months ago? This happens too many times a year, wishing you had a good history of all the queries you ran in the last 365 days …

To verify Beaver to do this, a tool built by me and my team.

# 4 don’t fix the data, fix the process that creates it

Let’s start with a concrete example.

One of the data pipelines from one of my previous companies kept crashing due to a non-unique issue: a table field was supposed to be a primary field, but there were duplicates. This field was client_id and normally a client was supposed to be in one and only one country.

So whenever we had this problem, we had to find the customer related to multiple countries and solve it. We also remind the sales team of the “one country rule”.

Should we create an alert system dedicated to this specific subject? Should we add a transformation layer on top? Should we remove this “one-stop” control? None of them. We need to (and haven’t done yet) just apply this rule when the data is created at the source i.e. in Salesforce by Salespeople.

Where possible, identify the root cause of your data problems and let people know that good data requires optimized processes for it. The processes are indeed made first to improve the business, but to have good data, they must take into account the dependencies of the data.

# 5 Share your analysis as widely as possible

Too many data readers wait until their data app is perfect before sharing it. Share it now (with a “WIP” disclaimer at the beginning if you like). Don’t go more than a few days without having a peer review of your work. It will give you perspective.

Conclusion

Yes, technical skills (Python, SQL, R …)).

Glad to have a constructive debate in the comments.

Also posted at: https://www.castordoc.com/blog/the-5-things-every-data-analyst-should-know

Keywords

Sean N. Ayres