Regression and Classification | Supervised Machine Learning ~ Coding School

Regression and Classification | Supervised Machine Learning

What is Regression and Classification in Machine Learning?

Data scientists utilize variously sorts of AI algorithms to find designs in huge data that lead to significant bits of knowledge. At an abnormal state, these various algorithms can be ordered into two gatherings dependent on the way they "learn" about data to make expectations: supervised and unsupervised learning.

Supervised Machine Learning: most of functional AI uses supervised learning. Supervised learning is the place you have input factors (x) and a yield variable (Y) and you utilize a calculation to take in the mapping capacity from the contribution to the yield Y = f(X). The objective is too inexact the mapping capacity so well that when you have new info data (x) that you can foresee the yield factors (Y) for that data.

Systems of Supervised Machine Learning algorithms incorporate direct and strategic relapse, multi-class grouping, Decision Trees and bolster vector machines. Supervised learning necessitates that the data used to prepare the calculation is as of now marked with right answers. For instance, a characterization calculation will figure out how to recognize creatures in the wake of being prepared on a dataset of pictures that are appropriately marked with the types of the creature and some distinguishing qualities.

Supervised learning issues can be additionally gathered into Regression and Classification issues. The two issues have as objective the development of a concise model that can anticipate the estimation of the needy quality from the trait factors. The contrast between the two undertakings is the way that the reliant property is numerical for relapse and absolute for characterization.

Regression

A relapse issue is a point at which the yield variable is a genuine or nonstop worth, for example, "compensation" or "weight". A wide range of models can be utilized, the least difficult is the direct relapse. It attempts to fit data with the best hyperplane which experiences the focuses.

Types of Regression Models:

For Examples:

Which of the following is a regression task?

Predicting the age of a person

Predicting the nationality of a person

Predicting whether the stock price of a company will increase tomorrow

Predicting whether a document is related to sighting of UFOs?

Solution: Predicting the age of a person (because it is a real value, predicting nationality is categorical, whether the stock price will increase is discreet-yes/no answer, predicting whether a document is related to UFO is again discreet- a yes/no answer).

Let’s take an example of linear regression. We have a Housing data set and we want to predict the price of the house. Following is the python code for it.

# Python code to illustrate

# regression using data set

import matplotlib

matplotlib.use('GTKAgg')

import matplotlib.pyplot as plt

import numpy as np

from sklearn import datasets, linear_model

import pandas as pd

# Load CSV and columns

df = pd.read_csv("Housing.csv")

Y = df['price']

X = df['lotsize']

X=X.reshape(len(X),1)

Y=Y.reshape(len(Y),1)

# Split the data into training/testing sets

X_train = X[:-250]

X_test = X[-250:]

# Split the targets into training/testing sets

Y_train = Y[:-250]

Y_test = Y[-250:]

# Plot outputs

plt.scatter(X_test, Y_test, color='black')

plt.title('Test Data')

plt.xlabel('Size')

plt.ylabel('Price')

plt.xticks(())

plt.yticks(())

# Create linear regression object

regr = linear_model.LinearRegression()

# Train the model using the training sets

regr.fit(X_train, Y_train)

# Plot outputs

plt.plot(X_test, regr.predict(X_test), color='red',linewidth=3)

plt.show()

The output of the above code will be:

Here in this graph, we plot the test data. The red line indicates the best fit line for predicting the price. To make an individual prediction using the linear regression model:

print( str(round(regr.predict(5000))) )

Classification

A grouping issue is a point at which the yield variable is a class, for example, "red" or "blue" or "illness" and "no malady". A characterization model endeavours to reach some inference from watched esteem. Given at least one data sources a characterization model will attempt to foresee the estimation of at least one results.

For instance, when sifting messages "spam" or "not spam", when taking a gander at exchange data, "false", or "approved". In short Classification either predicts clear cut class names or groups data (build a model) in light of the preparation set and the qualities (class names) in arranging traits and utilization it in ordering new data. There are various order models. Arrangement models incorporate calculated relapse, choice tree, irregular woods, inclination supported a tree, multilayer perceptron, one-versus rest, and Naive Bayes.

For example :

Which of the following is/are classification problem(s)?

Predicting the gender of a person by his/her handwriting style

Predicting house price based on area

Predicting whether monsoon will be normal next year

Predict the number of copies a music album will be sold next month

Solution: Predicting the gender of a person Predicting whether monsoon will be normal next year. The other two are regression.

As we talked about the arrangement with certain models. Presently there is a case of arrangement where we are performing grouping on the iris dataset utilizing RandomForestClassifier in python. You can download the dataset from Here

Dataset Description

Title: Iris Plants Database

Attribute Information:

1. sepal length in cm

2. sepal width in cm

3. petal length in cm

4. petal width in cm

5. class:

-- Iris Setosa

-- Iris Versicolour

-- Iris Virginica

Missing Attribute Values: None

Class Distribution: 33.3% for each of 3 classes

# Python code to illustrate

# classification using the data set

#Importing the required library

import pandas as pd

from sklearn.cross_validation import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.preprocessing import LabelEncoder

from sklearn.metrics import confusion_matrix

from sklearn.metrics import accuracy_score

from sklearn.metrics import classification_report

#Importing the dataset

dataset = pd.read_csv(

'https://archive.ics.uci.edu/ml/machine-learning-'+

'databases/iris/iris.data',sep= ',', header= None)

data = dataset.iloc[:, :]

#checking for null values

print("Sum of NULL values in each column. ")

print(data.isnull().sum())

#seperating the predicting column from the whole dataset

X = data.iloc[:, :-1].values

y = dataset.iloc[:, 4].values

#Encoding the predicting variable

labelencoder_y = LabelEncoder()

y = labelencoder_y.fit_transform(y)

#Spliting the data into test and train dataset

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size = 0.3, random_state = 0)

#Using the random forest classifier for the prediction

classifier=RandomForestClassifier()

classifier=classifier.fit(X_train,y_train)

predicted=classifier.predict(X_test)

#printing the results

print ('Confusion Matrix :')

print(confusion_matrix(y_test, predicted))

print ('Accuracy Score :',accuracy_score(y_test, predicted))

print ('Report : ')

print (classification_report(y_test, predicted))

Output:

Sum of NULL values in each column.

0 0

1 0

2 0

3 0

4 0

Confusion Matrix :

[[16 0 0]

[ 0 17 1]

[ 0 0 11]]

Accuracy Score : 97.7

Report :

precision recall f1-score support

0 1.00 1.00 1.00 16

1 1.00 0.94 0.97 18

2 0.92 1.00 0.96 11

avg/total 0.98 0.98 0.98 45

Saturday, June 22, 2019

Regression and Classification | Supervised Machine Learning

2 comments:

Connect With Us

Pages

Topics

Popular Posts

Label

Contact Form

About