“A breakthrough in machine learning would be worth ten Microsofts” , Bill Gates once quoted while emphasizing on how machine learning and the machine learning algorithms can change today`s world with a boom in the technologies that are revolving around us in today’s lifestyle , be it the ongoing projects of Self – Driving cars or the projects of Google to cater as a customer executive as displayed in the Google I/O 2018 and many more such classic things going around us.
With a fusion of machine learning in the vast number of technologies that are going to coming in the future, we all are moving towards a new height of living standards, but with the soaring demands of machine learning engineers to make these things happen in reality, they require a mix of knowledge of multiple domains like Mathematics, Statistics along with algorithms that are to be used in the models to be built-in for “MACHINE LEARNING “ solutions.
Now, let’s dive into some major algorithms that Machine Learning sector usually requires like the much famous ones – Linear Regression , Decision Trees etc. Broadly there are 3 major types of algorithms that usually a person thinking of a career in Machine Learning must know about – Supervised Learning, Unsupervised Learning , Reinforcement Learning.
Broad Classification of Machine Learning Algorithms
Supervised Learning algorithms consists of a target / outcome variable (or dependent variable) that has to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. Major examples of Supervised Learning are Regression, Decision Tree , Random Forest etc.
Whereas , Unsupervised Learning algorithms, does not have any target or outcome variable to predict / estimate. It is used for clustering population in different groups. Examples of Unsupervised Learning: K-means Clustering etc.
While, Reinforcement Learning algorithms train the machine to make specific decisions. And machine trains itself continually using trial and error. This machine learns from experience and tries to capture the best possible knowledge to make accurate business decisions. Example of Reinforcement Learning: Markov Decision Process etc.
As we discussed the major ML algorithm categories, now let’s discus the algorithms in these categories that are highly in demand to work upon Machine Learning models .
Starting with Linear Regression algorithm, in this we give output as a continuous value from a linear combination of input features & we draw a relationship between independent and dependent variables by fitting a best line of the format Y = m*x + c & it’s mainly used to predict the real values (cost of houses, number of calls, total sales etc.).
In K nearest neighbours algorithm , we store all available cases and classify new cases by a majority vote of its k neighbours. The case being assigned to the class is most common amongst its K nearest neighbors measured by a distance function. These distance functions can be Euclidean, Manhattan, Minkowski and Hamming distance but this algorithm requires a lot of computations .
Similarly , Logistic Regression algorithm is used to estimate discrete values ( like 0/1, yes/no, true/false ) based on given set of independent variable(s). and hence predicts the probability of occurrence of an event by fitting data to a logistic function .
But in SVM (Support Vector Machine) algorithm , we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.
Whereas in Naive Bayes classification algorithm , we rely on Baye’s Theorem with an assumption of independence between predictors i.e. we assume that the presence of a particular feature in a class is unrelated to the presence of any other feature.
While in Decision tree algorithm ; we split the population of the features into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible.
While, K means clustering algorithm classifies a given data set through a certain number of clusters (assume k clusters). Data points inside a cluster are homogeneous and heterogeneous to peer groups.
In Random Forest algorithm , we use a collection of decision trees (i.e a “Forest”) in order to classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).
In addition to these , we have another machine learning algorithm called Gradient Boosting Algorithm , in which we deal with plenty of data to make a prediction with high prediction power. Apart from this , we have an algorithm called Catboost ,in which we deal with categorical variables without showing the type conversion error, which helps us in tuning the model better rather than sorting out trivial errors . Moreover we have another algorithms like XGBoost etc.
The above mentioned algorithms are a way to just begin with exploring the Machine Learning techniques in this vast field of awesome innovations & creativity . Read in detail about a few of them.