Statistical Tools Every Data Scientist Should Know For Better CPG Analysis

Data is making its way across all industries and verticals today, becoming one of the must-haves for organizations on the path to success. In today’s economic climate, organizations must leverage all available data resources to stay on track by basing their decisions on in-depth analysis. Data plays a different role in different industries, and depending on their product / service offerings, organizations tend to vary the amount of data usage across the enterprise.

In the CPG industry, using data in overall business strategy is the real key to staying ahead of the competition and creating a cycle of continuous growth. The nature of CGP is volatile in the context of changing consumer demands and market trends. Data science enables organizations to leverage consumer and organizational data. It offers approaches to identify insights, forecast trends, and make informed business decisions based on data-driven forecasts.

Data & Analytics Conclave. Free Recordings>>

Data scientists tend to lack in-depth statistical knowledge that could deepen their information generation. Statistical tools are essential for data analysis. This is especially true when it comes to industries like GICs which are so volatile in the face of external factors. They allow organizations to use quantitative methods to test data-driven theories in a real-world scenario. The latest AI-based techniques and predictive economic modeling tools help organizations systematically identify the economic factors that can influence their business decisions. This combination of data with the quantitative application of statistical and mathematical models helps data scientists to test existing hypotheses and predict future trends.

Data scientists in CPG industries can easily adapt econometrics, given their deep understanding of the mathematics behind linear regression or panel data analysis techniques. For example, leaders of CPG organizations can use econometrics to optimize promotional spend and market return on investment, and use econometric and statistical tools to quantify the relationship and draw conclusions.

There are four broad categories of statistical tools that can be used, depending on the use case of the organization, its product / service offerings, and the type of information they aim to extract from their data. Depending on their use case, data scientists can choose from the four tools or combine them for best results. Let’s look at these tools in detail.

Descriptive statistics

Descriptive statistics essentially involve the measurement of central tendency and dispersion. The tool measures central tendency, dispersion and distribution of data using statistical techniques.

Data scientists can use this tool to summarize and describe their dataset and improve their exploratory data analysis by describing the characteristics of the data. Descriptive statistics are important because they help data scientists understand the data they are dealing with holistically. Obtaining information about the variables involved and the potential relationships between these variables is the first step in using the data for analysis.

Central tendency helps data scientists measure the median values ​​of a data set, helping them navigate the central location of the data to focus on. Descriptive statistics are further used to calculate the measure of the dispersion of variables in the dataset regarding centrally found values. This can be done via range, interquartile range (IQR), standard deviation, variance, mean absolute deviation, coefficient of variation, Gini coefficient, etc.

Finally, the methods can be summarized using the statistical distribution to calculate the probability that an event will recur.

The CPG industry can be easily affected by something as immediate and minor as a weekend storm that keeps consumers from shopping. Descriptive statistics help businesses use past data and gain timely future information. GPC manufacturers can leverage their historical data to understand buyer experiences and use the information to create real-time insights. For example, take the COVID-19 pandemic crisis and the fluctuating demand for disinfectants. Studying economic data may highlight the potential relationship between new waves of COVID-19 and the demand for disinfectants; CPG analysts can forecast the increase or decrease in demand and manufacture disinfectants accordingly.


Along with descriptive analysis, data scientists can use regression analysis to study the relationship between dependent and independent variables. In CPG, the technique is best used to find the causal effect between variables. For example, using the disinfectant example, data scientists can use regression techniques to determine the relationship between the increase in COVID-19 cases and the demand for disinfectants.

Linear regression techniques are used to quantify the relationship between several variables and to adjust for confounding effects. Data scientists can opt for simple or multiple linear regression depending on the nature and number of explanatory variables involved in the problem. Regularization techniques such as lasso, crest or elastic net can complement the analysis for a large set of predictor variables.

Another recommended regression technique is the panel data model for modeling time series data and predicting time-dependent observations. It provides multidimensional data related to an observation that has been repeatedly measured over a period of time. This could include variables such as individuals, product choices, city, household items, etc.

Essentially, it provides information about the difference in variables between individuals – over a period of time. The panel data model uses techniques such as the Pooled OLS, the fixed effects model, and the random effects model.


One of the most prominent use cases for statistical tools is forecasting. Forecasting market trends and consumer demands is the basis of CPG, and it is important to identify them correctly for better return on business investment.

Benchmark forecasting is a technique used to build forecasting intuition that can be used as a benchmark for additional complex layers. Benchmark forecasting uses techniques such as drift, naivety, seasonality, mean, seasonal naivety, random walk, linear trend, and geometric random walk.

Smart forecasting is an essential tool for data scientists, given its adaptive nature. While generic modeling tools are built with the industry in mind, forecasting tools can adapt completely to the needs of the business. They integrate historical company data as dependent variables within the model, which allows the following metrics to be extremely company specific.

See also

It’s important to note that a one-size-fits-all approach to forecasting doesn’t work for CPM and retail businesses. Instead, executives and data scientists must create a forecasting method specific to a category, region, or product. The template’s ability to easily customize, add, or modify allows it to become flexible to ensure accuracy is maintained in various aspects of the business planning process for CPG.

For example, data scientists can use historical data on supply chain issues when oil prices have risen. They can use it to predict the next oil price hike and its impact on the supply chain. They can then use this information to prepare in advance.

Hypothesis tests

So you’ve figured out your data and made some what-if predictions based on the data. The next tool is to make sure that your predictions are correct and that the manufacturers can move forward with them.

What-if testing is an effective statistical tool to help data scientists obtain evidence to support their findings and conclusions. The tool focuses on measuring claims against accepted facts about the general population. Measures such as the p-value can support or reject claims or confidence intervals to measure the degree of uncertainty. Additionally, CPG-focused data scientists can use hypothesis testing to verify the likelihood of detected consumer behavior.

They can do this using several methods, such as

  • The t test
  • Anova
  • Chi-square test

For example, data scientists working at a clothing retailer hypothesized that the sale of tank tops is high during the summer among girls aged 15 to 30. Data scientists will use hypothesis testing tools like the t test to test and prove this hypothesis.

As you will have noticed, these tools do not act independently. That’s why they help data scientists create a holistic view of industrial data and its impact on the economy. The interconnection of econometric tools with data analysis is essential for data scientists to consider when working on CPG and FMCG applications. The implementation of statistical modeling for forecasting and price analysis in CPG is a critical phenomenon in the growing future of data science.

The views, thoughts and opinions expressed in this article belong solely to the author and do not reflect the views and opinions of the author’s employer, any other organization, committee or other group or individual. . This article is written by a member of the AIM Leadership Council. The AIM Leaders Council is an invitation-only forum for senior executives in the data science and analytics industry. To check if you are eligible for membership, please complete the form here.

Subscribe to our newsletter

Receive the latest updates and relevant offers by sharing your email.

Join our Telegram Group. Be part of an engaging community

Sean N. Ayres