In sample evaluation tells us how our model performance using the training dataset, but how about using an unseen dataset(testing dataset).
we split our data into two part: in-sample dataset (training), out-sample dataset (testing dataset)
Generalization Performance:
it generalize or measure how our model predict using unseen dataset
| ... # split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, ...) |
Overfitting, Underfitting and Model Selection
underfitting: when the model is too simple to fit the data
overfitting: when the model fit the noise rather that the function
Ridge regression is a regression that is employed in a Multiple regression model when Multicollinearity occurs. Multicollinearity is when there is a strong relationship among the independent variables. Ridge regression is very common with polynomial regression
Ridge Regression:
it prevent overfitting(when you have multi independent variable or features) ,
Ridge regression controls the magnitude of
these polynomial coefficients by introducing the parameter alpha.
Alpha is a parameter we select before fitting or training the model.
Each row in the following table represents an increasing value of alpha.
Let's see how different values of alpha change the model.
If alpha is too large,
the coefficients will approach zero and underfit the data.
If alpha is zero,
the overfitting is evident.
For alpha equal to 0.001,
the overfitting begins to subside.
For Alpha equal to 0.01,
the estimated function tracks the actual function.
When alpha equals one,
we see the first signs of underfitting.
The estimated function does not have enough flexibility.
At alpha equals to 10,
we see extreme underfitting
Grid Search
we use validation dataset to pick the best parameter
Identify over-fitting and under-fitting in a predictive model: Overfitting occurs when a function is too closely fit to the training data points and captures the noise of the data. Underfitting refers to a model that can't model the training data or capture the trend of the data.
Apply Ridge Regression to linear regression models: Ridge regression is a regression that is employed in a Multiple regression model when Multicollinearity occurs.
Tune hyper-parameters of an estimator using Grid search: Grid search is a time-efficient tuning technique that exhaustively computes the optimum values of hyperparameters performed on specific parameter values of estimators.
0 Comments:
Post a Comment