Solve the mysteries with a data scientist

Having a one-on-one over a cup of tea with our Data Science Manager, Edmond, was a brilliant start to the week. It was a perfect summer morning, for a date with the person, without whom most of my creations would be impossible. As I mentally pondered so many possible conversations, the first question I blurted out upon meeting him was, “How do you think like a data scientist?”

Edmond in his sunny disposition replied, “I’m flattered you chose me for a data science reveal.” An imperative synopsis of an aspiring data scientist might be someone with a knack for math, statistics, probability, analytical thinking, and computer science. There is a wide choice of computer languages, depending on the levels of implementation and the habitat of the data scientist. Python was the language of choice in Edmond’s case. For conventional machine learning algorithms, there are a variety of classifications, regression, and clustering packages. A good data scientist doesn’t necessarily need to know all the packages (quite an impossible task as he said), but he’s well versed in the code documentation (including related blogs, code samples and, if necessary, the scientific papers the algorithms are based on) related to the skills needed for the specific problem to be solved is helpful.

Thought lingered on why some key industries like logistics have yet to explore the power and possibilities of computer vision, machine learning, neural networks, etc., but should do it in the future? Is it because the most underrated point of machine learning/data science solutions is that the person has to be able to know or measure if/how the ML algorithm worked and a result expected for a specific entry must be known in advance?

Pat came the answer: “There have been compelling leaps in the directions of neuroscience and machine learning, with the latter mimicking the former.” An analogy cited by Edmond was that DNN algorithms are like recipes, and therefore the output of these would be like the finished product of a kitchen/baking job. Just as in food one tastes or samples the results and one knows if it is good/cooked/seasoned or undercooked, similarly in DNN one should be able to taste or “taste” the result and must have the analytical skills to know if it meets the expectations or the defined parameters.

I wondered where we were going in five years, I almost felt like I had discovered a maze map from a treasure hunt. Nonchalantly, he smiled and said, “I think there are many areas that are being explored that could be a game-changer: personalized health, personalized education, legal case studies, agriculture, transportation, retail, e-commerce regulatory compliance analysis, drones and Suite. The list is literally endless.

The morning turned out to be more interesting than I expected and I remembered a quote from Edward Deming: “Without data, you are just another person with an opinion. Time was ticking and I felt like we were only at the tip of the iceberg. I greatly admired data scientists and couldn’t help but ask Edmond who he would congratulate with accolades in the data science community?

Edmond, with an infectious burst of laughter, replied, “I suppose the answer wouldn’t necessarily be a specific individual, but rather the type of individual (usually a university professor) who can excel in various fields, and often they don’t. write just one or two “pioneer” articles in a field or application and move on to something new. They make their mark and continue to grow. Some names that come to mind (globally ) are: On the academic side, Vladimir Vapkin, Andrew Ng, Yann LeCun, Geoffry Hinton, Yoshua Bengio, Amnon Shashua. Cassie Kozyrkov inspires with amazing sessions on deep learning, Jason Brownlee, who has written many books, blogs , code examples and much more on a myriad of deep learning topics and finally, my personal mentors when I was a student at MIT were: Roz Picard, Neil Gershenfeld, Pattie Maes and Rodney Brooks, Bill Freeman, Er ic Grimson. Not to mention the many students/colleagues who have become pioneers in their respective fields.

It was by far one of the most endearing conversations I’ve had in a long time and had to be written (typed) if I can put it correctly. Solving a host of questions gave me some extra thoughts to ponder. A new direction may be waiting for us. After exploring the positives, I wanted to hear from Edmond if there was anything he thought we could do better.

With his ever-endearing smile, he nodded and said, “Always beware of extreme cases, that is, outliers.” Any ML system is only as good as its training examples allow. He shared his favorite example of Joy Boulamwini, an MIT Media Lab student, who about 5 years ago discovered the limitations of facial recognition (software that produces a vector of key points on an image, given a face) . She found that while “on average” several of the major software packages were about 98% to 99% successful in detecting faces, those same software packages were only 38% successful in detecting dark-skinned women in their software. I appreciated this wisdom about how we as innovators should always err on the side of caution and that society as a whole should set tolerance levels for mistakes, if any.

This morning left me with so much valuable information, answered many questions, and like any thought-provoking conversation, raised so many other questions that I hoped to discuss with Edmond when we meet next. however, I continued to laugh throughout the day as I remembered his vision of how, over time, DS/ML systems will be more intrusive and passive as beyond smart checkouts supermarkets (either when you fill your basket or when you leave the store), we could possibly have a smart fridge or kitchen that can convince us to eat healthy! I’m sure many of us are looking forward to this future!

(With contributions from Edmond Chalom, Data Science Lead, VideoVerse)



The opinions expressed above are those of the author.


Sean N. Ayres