Data Scientist Hackathon by TheMathCompany

MachineHack, in association with TheMathCompany, is launching a fortnightly hiring hackathon for data scientists and machine learning practitioners from July 02 to 19, 2021. Winners will have the exclusive opportunity to build a career enriching analysis at TheMathCompany.

TheMathCompany is a modern, hybrid consulting firm that builds custom AI applications for Fortune 500 companies and equivalent. We enable analytical transformations by building the core capabilities of businesses to unleash immense value in data. Our comprehensive consulting model fills the pressing gaps that exist within conventional analytics service providers and off-the-shelf products. Our experts offer a variety of problem-solving capabilities, with rapid delivery, reuse, and scalability of applications tailored to business needs – powered by Co.dx, our proprietary AI master engine.

TheMathCompany has won numerous awards and is recognized as one of the world’s leading analytics companies –

Show your data science courage by participating in the Hiring Hackathon and gain the exclusive opportunity to build a rewarding career in the analytics industry. The hackathon is open to data scientists, ML practitioners, analytics professionals and enthusiasts who wish to present their expertise.

The challenge starts July 02, 2021.

Problem statement and description

Customers who want to buy a new car expect the maximum return on their investment based on their price range. However, the wide variety of cars with differentiated capacities and characteristics such as model, make, mileage, year of production, category, fuel type, engine volume, color and accessories, makes it difficult for buyers to make an informed decision. To that end, MachineHack, in association with TheMathCompany, is calling on the data science community to develop a machine learning model to predict the price of a car on a budget with the best features available.

To solve the car price problem, MachineHack created a 9,237 row workout dataset with 18 columns, including the ‘Price‘as the target variable and a test data set of 8245 rows with 17 columns.

The hackathon requires some prerequisite skills such as multivariate regression, large dataset, under-training vs over-training, and the ability to optimize RMSE to generalize well on invisible data.

Submission guidelines

Participants must submit a .csv / .xlsx file with exactly 8245 Lines with a column (that is to say Price). The submission will return a “Invalid score” whether additional columns or rows are presented.

Sklearn models support to predict() method to generate predicted values.

Evaluation criteria

The hackathon evaluation will be done using the RMSLE metric. We can use ‘np.sqrt (mean_squared_log_error (actual, expected)‘to calculate the same.

The hackathon will also support both private and public rankings, where the public rankings will be assessed on 70% of test data. However, the private ranking will be made available at the end of the hackathon and will be evaluated on 100% of test data.

** In addition to the challenge, participants must also complete a Multiple choice questionnaire to be shortlisted for an interview with TheMathCompany.

the the final score will be represent the score obtained on the basis of ‘Best score‘on the public classification.


TheMathCompany will select the top three (3) winners based on the criteria given. The cash award is for interested applicants wishing to be interviewed / hired by TheMathcompany.

Jackpot: 40,000 INR

Second prize: 20,000 INR

Third Prize: 10,000 INR

The hackathon will end on July 19, 2021.

Description of the dataset:

  • Train.csv – 19237 rows x 18 columns (includes ‘the price’ columns as target)
  • Test.csv – 8245 rows x 17 columns
  • Sample submission.csv – Please check the “Evaluation” section for details on generating a valid submission.

Description of the attribute:

  • username
  • Price
  • Sampling
  • Maker
  • Model
  • Year of production
  • Category
  • Leather interior
  • Fuel type
  • Engine volume
  • Mileage
  • Cylinders
  • Gearbox type
  • Driving wheels
  • Doors
  • wheels
  • Color
  • Airbags


  • Multivariate regression
  • Large dataset, under-learning vs over-learning
  • Optimize the RMSLE score as a metric to generalize well on invisible data.

Sean N. Ayres