- Simple Linear Regresssion:
* Predictor (Independent)/ variable-X
* Target(dependent)/variable-Y
The Equation:
Y = \[y = b_{0} + b_{1}X\]
b_{0}: intercept
b_{1}: slop
The Code: Fitting a Simple Linear Model Estimator (X: predictor, Y: Target)
#Import Linear_model from scikit-learn
from sklearn.linear_model import LinearRegresssion
#Creat a linear Regression Object using the consturctor.
lm=LinearRegression()
#Define the predictor and Target variable:
X = df[['highway-mpg']]
Y = df['price']
#use lm.fit(X,Y) to fit the model.
lm.fit(X,Y)
SLR (Estimator Linear Regression)
Now we can obtain a prediction
# to view the intercept(b0): lm.intercept_
#to view the slop(b1): lm.conef_
- Multi Linear Regression
The equationL
\[y = b_{0} + b_{1}X_{1} + b_{2}X_{2} + b_{3}X_{3} + b_{4}X_{4}\]
b1 is the coefficient of x1 and so on.
The Code: to fitting a multiple Linear Regression:
# extract the for 4 predictor variable and store them in the variable z
z = df[['horsepower', ' curb-weight', 'engine-size', 'highway-mpg']]
#Train the model
lm.fit(z, df['price'])
# obtain a prediction
Yhat = lm.predict(X)
Model Evaluation using Visualization
the main benefit to use regression plot are:
it gives us a good estimate of :
- relationship between two variable
- the strength of the correlation
- the direction of the relationship (+, -)
The horizontal axis is independent variable while the vertical axis is dependent variable
Regression Plot
The code:
import seaborn as sns
sns.regplot(x="highway-mpg", y="price", data=df)
plt.ylim(0,)
Polynomial Regression:
Quadratic-2nd order
\[y = b_{0} + b_{1}X_{1} + b_{2}X_{1}^{2}\]
Cubic-3nd order
\[y = b_{0} + b_{1}X_{1}+ b_{2}X_{1}^{2}+ b_{3}X_{1}^{3}\]
Higher order
\[y = b_{0} + b_{1}X_{1}+ b_{2}X_{1}^{2}+ b_{3}X_{1}^{3}+...\]
The Code:
- Calculate Polynomial of 3rd order
f = np.polyfit(x,y,3)
p = np.polyld(f)
- print out the model
print(p)
- Polynomial Regression with more than one dimension:
from sklearn.preprocessing import PolynomialFeatures
pr = PolynomialFeatures(degree=2, include_bias=False)
x_polly=pr.fit_transform(x[['horsepowe', 'curb-weight']])
pr=PolynomialFeatures(degree=2)
pr=PolynomialFeatures(degree=2, include_bias=False)
pr.fit_transform([[1,2]])
- Pre-processing
1- Normalize each features
from sklearn.preprocessing import StandardScaler
# Normalize:
SCALE=StandardScaler()
#Fitting
SCALE.fit(x_data[['horsepower', 'highway-mpg']])
#Transform
x_scale=SCALE.transform(x_data[['horsepower', 'higway-mpg']])
Pipelines
There are many steps to getting predictions
[Normalization ----> Polynomial Transform ]-----> Linear Regression
Transformations Predictions
#import all pipelines we need
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklear.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# create Pipeline instructor
Input = [('scale', StandardScaler()), ('polynomial', PolynomialFeatuers(degree=2),...(mode', LinearRegression() )]
#piple line constructor
pipe=Pipeline(Input)
#trian the pipleline
Pipe.fit(df[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']],y)
yhat=Pipe.predict(X[[['horsepower', 'curb-weight', 'engine-size', 'highway-mpg']])
0 Comments:
Post a Comment