There has been an ongoing debate about the distinctions and similarities between data analytics (“DA”), statistical modelling (“SM”) and machine learning (“ML”). Myself and my classmates included.
People who are not fully-equipped with the understanding and applications of the 3 terms (DA, SM, ML) mentioned above often use the terms interchangeably. I think it is extremely important to set the terminologies right and consistent across your team/organization. Despite the glorious success of machine learning techniques improving businesses’ performance famed by users, there are many challenges that were not revealed (as much as the success) and it is important to address those challenges if we want to increase the adoption rate across companies, industries and countries.
One of the known challenges is the miscommunication between the decision makers (i.e. the senior management) and the data science team. One of the main issues is getting the buy-in from the decision makers in believing in the output produced by the data science team and changing their mindset from intuition-based decision making to data-driven decision making.
In my opinion, one of the fundamental, if not the earliest ways to address this challenge is to ensure that everyone in the team/organization are on the same page in utilizing and what more understanding the terminologies used in the 3 different aspects – DA, SM, ML. I have always believed that in whatever you do or learn, get the foundations right.
So what is ML and what is the difference between ML and DA and SM?
ML is a computer activity which learns patterns, insights, key relationships from existing historical data and subsequently predict the future based on new and unseen data. The learning and predictive capability are equally important, that make up what machine learning is about.
DA consists of 3 different types of analytics – descriptive analytics, predictive analytics and prescriptive analytics. Hence, it is worth noting which type of analytics that one is referring to when they talk about DA; it could be a specific type of analytics or a combination of the 2 or 3 of them. DA is a tool for ML and both, predictive and prescriptive analytics fall under ML.
SM on the other hand, overlaps descriptive analytics of DA and predictive analytics of ML. On the descriptive analytics, the simplest form would be summarizing the data such as the average, median, mode to understand better the data we have and the work has been done many decades ago by statisticians. On the predictive analytics, while linear regression is the most frequently used model by statisticians and it is also the most basic form of ML models, the interpretation and application are different. For statisticians, it is important for them to understand the distribution of the data, whether it is a normal distribution, binomial distribution, poisson distribution etc. However, to perform ML, it is not necessary for us to know the distribution of data. Also, because of the increasing volume of data, we are able to split the data into training (for learning) and test (for predicting) data. Unlike the linear regression model run by statisticians where previously amount of data is limited, the act of splitting the data into training and testing is not required. As such, the computer is predicting based on historical and seen data. This violates the belief that past performance do not guarantee the future results.
Hope this explanation is suffice for now, will expand it further in the near time future.