Definition of Logistic Regression
Lets learn about using SKLearn to implement Logistic Regression. First of all lets get into the definition of Logistic Regression. The logistic model (or logit model) is a statistical model that is usually taken to apply to a binary dependent variable. In regression analysis, logistic regression or logit regression is estimating the parameters of a logistic model.
So, more formally, a logistic model is one where the log-odds of the probability of an event is a linear combination of independent or predictor variables. The two possible dependent variable values are often labelled as “0” and “1”, which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. So, the binary logistic regression model can be generalized to more than two levels of the dependent variable: categorical outputs with more than two values are modelled by multinomial logistic regression, and if the multiple categories are ordered, by ordinal logistic regression, for example the proportional odds ordinal logistic model.
In simpler words
It is borrowed from the field of statistics and despite its name, it is not an algorithm for regression problems, where you want to predict a continuous outcome. Instead, Logistic Regression is a method for binary classification. It gives you a discrete binary outcome between 0 and 1. To say it in simpler words, it’s outcome is ‘something will happen or not’ or we can say ‘yes or no’ .
An example to get an idea
A simple example of classifying will it rain today ‘yes or no’ can be taken for understanding. Here our input can be data of past few days . Temperature , humidity , wind speed etc. can be the input. We will call them input features. And ‘yes or no’ will be our output label.
Step by step working of Logistic Regression
Logistic regression measures the relationship between the dependent variables and one or more independent variables . It is done so by estimating probabilities using logistic function.
Here the answer will it rain today …’ yes or no ‘ depends on the factors temp, wind speed, humidity etc. So our dependent variable is output label and independent variables are our input features.
The probabilities must then be transformed into binary values in order to actually make a prediction. The logistic function used for this purpose is called Sigmoid function. Sigmoid function takes any real value input and maps it to 0 or 1. “-1 and 1” can also be the choice.
What’s actually happening…
We want to maximize the likelihood so that a random data point gets classified correctly, which is called Maximum Likelihood Estimation. Maximum Likelihood Estimation is a general approach to estimating parameters in statistical models. You can maximize the likelihood using different methods like an optimization algorithm. Newton’s Method is such an algorithm and can be used to find maximum (or minimum) of many different functions, including the likelihood function. Instead of Newton’s Method, you could also use Gradient Descent because its simpler.
That’s enough to get started with what Logistic regression is . But there is more to Logistic regression than described here .
Now let’s start with implementation part: We will be using Python 3.0 here. So, basic knowledge of Python is required.
Sklearn Logistic Regression on Digits Dataset
Loading the Data (Digits Dataset)
The digits dataset is one of datasets scikit-learn comes with that do not require the downloading of any file from some external website. The code below will load the digits dataset.
import numpy as npp import matplotlib.pyplot as plt from sklearn.datasets import load_digits digits = load_digits() #After loading the dataset let's get familiar with what we have loaded in "digits". print(type(digits.data)) print(type(digits.target)) print('Image Data Shape' , digits.data.shape) print("Label Data Shape", digits.target.shape)
<class 'numpy.ndarray'> <class 'numpy.ndarray'> Image Data Shape (1797, 64) Label Data Shape (1797,)
Let’s see what kind of images we are dealing with:
for i in range(5): image=digits.data[i] label=digits.target[i] plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray) plt.title(‘label: %i\n’ %label , fontsize = 20) plt.show()
Splitting Data into Training and Test Sets
In the beginning we will make training and test sets to make sure that after we train our classification algorithm, it is able to generalize well to new data as well. So creating our model as:
from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=0)
Actual works starts from here hence..
#First of all we will import the Logistic regression model provided in sklearn. #It's noteworthy that in sklearn, all machine learning models are implemented as Python classes. #Step 1: from sklearn.linear_model import LogisticRegression #Make instance/object of the model because our model is implemented as a class. #Step 2: LR = LogisticRegression() #Train the model on the input train data #Step 3: LR.fit(x_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
Now let’s predict labels for new data(test data) .
Now using the information the model learned during the model training process as follows.
predictions = LR.predict(x_test)
Checking the performance of our Logistic Regression model
There are many ways to check how good our model is performing. One such way is using score .It checks how accurately the values are being predicted, so its quite simple and easy. Therefore, using score as follows.
score = LR.score(x_test, y_test) print("Accuracy is ",score*100,"%")
Accuracy is 95.0 %
With that score this post comes to an end. So,I hope I was able to deliver the basic meaning and use of Logistic regression.
You can also read about some other cool stuff on machine learning here