What is Regression and
Classification in Machine Learning?
Data scientists utilize variously
sorts of AI algorithms to find designs in huge data that lead to significant
bits of knowledge. At an abnormal state, these various algorithms can be
ordered into two gatherings dependent on the way they "learn" about
data to make expectations: supervised and unsupervised learning.
Supervised Machine Learning: most
of functional AI uses supervised learning. Supervised learning is the place you
have input factors (x) and a yield variable (Y) and you utilize a calculation
to take in the mapping capacity from the contribution to the yield Y = f(X).
The objective is too inexact the mapping capacity so well that when you have new
info data (x) that you can foresee the yield factors (Y) for that data.
Systems of Supervised Machine
Learning algorithms incorporate direct and strategic relapse, multi-class
grouping, Decision Trees and bolster vector machines. Supervised learning
necessitates that the data used to prepare the calculation is as of now marked
with right answers. For instance, a characterization calculation will figure
out how to recognize creatures in the wake of being prepared on a dataset of
pictures that are appropriately marked with the types of the creature and some
distinguishing qualities.
Supervised learning issues can be
additionally gathered into Regression and Classification issues. The two issues
have as objective the development of a concise model that can anticipate the
estimation of the needy quality from the trait factors. The contrast between
the two undertakings is the way that the reliant property is numerical for
relapse and absolute for characterization.
Regression
A relapse issue is a point at
which the yield variable is a genuine or nonstop worth, for example,
"compensation" or "weight". A wide range of models can be
utilized, the least difficult is the direct relapse. It attempts to fit data
with the best hyperplane which experiences the focuses.
Types of Regression Models:
For Examples:
Which of the following is a
regression task?
Predicting the age of a person
Predicting the nationality of a
person
Predicting whether the stock price of
a company will increase tomorrow
Predicting whether a document is
related to sighting of UFOs?
Solution: Predicting the age of a
person (because it is a real value, predicting nationality is categorical, whether the stock price will increase is discreet-yes/no answer, predicting whether a document is related to UFO is again discreet- a yes/no answer).
Let’s take an example of linear
regression. We have a Housing data set and we want to predict the price of the
house. Following is the python code for it.
# Python code to illustrate
# regression using data set
import matplotlib
matplotlib.use('GTKAgg')
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import pandas as pd
# Load CSV and columns
df = pd.read_csv("Housing.csv")
Y = df['price']
X = df['lotsize']
X=X.reshape(len(X),1)
Y=Y.reshape(len(Y),1)
# Split the data into training/testing sets
X_train = X[:-250]
X_test = X[-250:]
# Split the targets into training/testing sets
Y_train = Y[:-250]
Y_test = Y[-250:]
# Plot outputs
plt.scatter(X_test, Y_test, color='black')
plt.title('Test Data')
plt.xlabel('Size')
plt.ylabel('Price')
plt.xticks(())
plt.yticks(())
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(X_train, Y_train)
# Plot outputs
plt.plot(X_test, regr.predict(X_test), color='red',linewidth=3)
plt.show()
The output of the above code will
be:
Here in this graph, we plot the
test data. The red line indicates the best fit line for predicting the price.
To make an individual prediction using the linear regression model:
print(
str(round(regr.predict(5000))) )
Classification
A grouping issue is a point at
which the yield variable is a class, for example, "red" or
"blue" or "illness" and "no malady". A
characterization model endeavours to reach some inference from watched esteem.
Given at least one data sources a characterization model will attempt to
foresee the estimation of at least one results.
For instance, when sifting
messages "spam" or "not spam", when taking a gander at
exchange data, "false", or "approved". In short
Classification either predicts clear cut class names or groups data (build a
model) in light of the preparation set and the qualities (class names) in
arranging traits and utilization it in ordering new data. There are various order
models. Arrangement models incorporate calculated relapse, choice tree,
irregular woods, inclination supported a tree, multilayer perceptron, one-versus
rest, and Naive Bayes.
For example :
Which of the following is/are
classification problem(s)?
Predicting the gender of a person
by his/her handwriting style
Predicting house price based on
area
Predicting whether monsoon will
be normal next year
Predict the number of copies a
music album will be sold next month
Solution: Predicting the gender
of a person Predicting whether monsoon will be normal next year. The other two
are regression.
As we talked about the arrangement
with certain models. Presently there is a case of arrangement where we are
performing grouping on the iris dataset utilizing RandomForestClassifier in
python. You can download the dataset from Here
Dataset Description
Title: Iris Plants Database
Attribute Information:
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica
Missing Attribute Values: None
Class Distribution: 33.3% for
each of 3 classes
# Python code to illustrate
# classification using the data set
#Importing the required library
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
#Importing the dataset
dataset = pd.read_csv(
'https://archive.ics.uci.edu/ml/machine-learning-'+
'databases/iris/iris.data',sep=
',', header= None)
data = dataset.iloc[:, :]
#checking for null values
print("Sum of NULL values in each column. ")
print(data.isnull().sum())
#seperating the predicting column from the whole dataset
X = data.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
#Encoding the predicting variable
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
#Spliting the data into test and train dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size
= 0.3, random_state = 0)
#Using the random forest classifier for the prediction
classifier=RandomForestClassifier()
classifier=classifier.fit(X_train,y_train)
predicted=classifier.predict(X_test)
#printing the results
print ('Confusion Matrix :')
print(confusion_matrix(y_test, predicted))
print ('Accuracy Score :',accuracy_score(y_test, predicted))
print ('Report : ')
print (classification_report(y_test, predicted))
Output:
Sum of NULL
values in each column.
0
0
1
0
2
0
3
0
4
0
Confusion Matrix
:
[[16 0 0]
[ 0 17 1]
[ 0 0 11]]
Accuracy Score :
97.7
Report :
precision recall
f1-score support
0
1.00 1.00 1.00 16
1
1.00 0.94 0.97 18
2
0.92 1.00 0.96 11
avg/total 0.98
0.98 0.98 45
qweqqwe
ReplyDeleteIEEE final year projects on machine learning
ReplyDeleteJavaScript Training in Chennai
Final Year Project Centers in Chennai
JavaScript Training in Chennai