The complete guide to decision tree- Theory, Implementation and Hacks

Machine learning is not very simple. Some algorithms are very hard to comprehend and harrowingly tricky to implement. A beginner in the field of Machine Learning or Data Science should always start with the more straightforward learning models. The decision tree algorithm is the go-to topic for this purpose.  Here we shall give you a basic idea about decision trees and how to implement it.

Basics of Decision tree

A decision tree is a supervised learning model which can be used for both classification and regression. It works through making splits over different attributes. In other words, it asks questions to the training data and determines how to come to a decision regarding the attribute that is to be predicted. For example, consider the following image:

In this figure, we are trying to determine whether a person is fit or not. We ask the questions as given in the tree and move along the path that the data leads us on. This is an amazingly simple model to make predictions about the data and is very effective if used properly.

Creating the model

The decision tree is made by repeatedly asking questions to the training data till the right questions to be asked are determined. The algorithm is a part of Artificial Intelligence because it has the ability to determine the proper questions. This is the main challenge in the fabrication of a steadfast decision tree.

Asking the right questions is done through the concept of entropy. Just like Physics or Chemistry, entropy is a measure of randomness of the attribute in question. Our aim is to reduce the randomness in the data so that a dependable decision is arrived at. Entropy is calculated by the summation of the probability of occurrence for each case multiplied its logarithmic value. A good question reduces the entropy a lot more than a bad question. The lower the entropy goes, the closer we get to a correct decision.

The reduction in the entropy of an attribute due to a question is called information gain of that question. The better the information gain, the better the question. For example, you are playing a game where a person thinks of a famous person and you have to tell who it is by asking no more than 10 questions. The first question that you should ask in this case, is whether the person is still living or dead. You must not ask whether the person is Rahul Dravid or Stephen Hawking first- that would take days. There are many ways to create a model like ID3, CART and so many more. You can learn them in detail here.


Here is a simple implementation of this algorithm using the sci-kit learn package in Python. You can use it with different datasets instead of a dummy implementation. Some good datasets for this purpose are the Automobile dataset and Mushroom dataset

>>> from sklearn import tree
>>> X = [[0, 0], [1, 1]]
>>> Y = [0, 1]
>>> clf = tree.DecisionTreeClassifier()
>>> clf =, Y)
>>> clf.predict([[2., 2.]])

You can also look up some applications of regression using decision tree here.


These are the basics of decision trees that every other guy in machine learning knows. The real trick is knowing when to use them and best practices for using decision trees. For that, I lay down the following 4 points:

1. Advantages

Decision trees are simple yet widely used. The reasons behind that are as follows:

  • They form amazing visuals that even a child can understand.
  • Computational efficiency is quite favorable. They are very fast and require less iteration of interaction with dataset compared to other complex models like Support Vector Machines and Neural Networks.
  • No baseless assumptions are made in case of classification.

2. Disadvantages

However, they are not always used. They have some drawbacks which render them useless in certain cases.

  • They have high error rates(Solved in 4th point)
  • Cannot model random decision boundaries unlike SVM
  • Not suitable for incremental or reinforcement learning

3. When to use or not use a Decision Tree

The above 2 points make one important inference, that is when to resort to a decision tree. A decision tree should never be used when there is a simple polynomial relationship between known attributes and the attribute to be predicted. In this case, for regression, we use linear regression and logistic regression for classification. Decision trees should mainly be used when no aspect of the spatial distribution of the classifier over an n-dimensional graph is known. In other words, when we are unable to formulate a classifier mathematically on a graph- decision trees are the most useful.

4. Random forests

The high error rate may look like a bummer but it is not. Random forest is an ensemble learning technique that you can use to get results from decision trees. It was formulated by the legendary statistician- Leo Breiman. In this model, we make multiple decision trees with different sets of data and we combine the results from each tree to find better predictions. You can read more about them here.

Decision tree is probably the first algorithm taught in any Machine learning course on the internet. This blog teaches you that first chapter with a headstart on how to implement and move forward on your machine learning journey. There are many new types of trees like conditional inference trees and other metrics besides information gain like Gini index are driving better implementations of this model. Decision trees came into implementation phase in the late 1970s. To this day, it remains one of the most basic things in machine learning and explains the goal of supervised machine learning as a whole to any layman looking to understand the trade. With newer tree structures and application-specific algorithms, decision trees are becoming more and more important in the intelligent system driven world of the near future.

Don't miss out!
Subscribe To Our Newsletter

Learn new things. Get an article everyday.

Invalid email address
Give it a try. You can unsubscribe at any time.