Fatskills
Practice. Master. Repeat.
Study Guide: Data Science and Machine Learning 101: Machine Learning Core Supervised Learning Regression Linear Polynomial Regularization Evaluation MSE RMSE R²
Source: https://www.fatskills.com/introdution-to-engineering/chapter/data-science-and-machine-learning-data-science-and-machine-learning-machine-learning-core-supervised-learning-regression-linear-polynomial-regularization-evaluation-mse-rmse-r%C2%B2

Data Science and Machine Learning 101: Machine Learning Core Supervised Learning Regression Linear Polynomial Regularization Evaluation MSE RMSE R²

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~5 min read

What This Is

Supervised regression learns a mapping (f(\mathbf{x})) from input features (\mathbf{x}) to a continuous target (y) using labeled examples. It’s the workhorse when you need to predict a numeric quantity—e.g., estimating next‑month house prices from location, size, and age, or forecasting daily electricity demand from weather and calendar data. Because the target is continuous, the model’s error can be measured directly, making regression ideal for budgeting, capacity planning, and any “how much?” business question.

Key Terms & Formulas

Linear Regression – Model: (\hat{y}= \beta_0 + \sum_{j=1}^{p}\beta_j x_j). (\beta) are coefficients learned by minimizing squared error.
Ordinary Least Squares (OLS) – Objective: (\displaystyle \min_{\beta}\; \sum_{i=1}^{n}(y_i-\hat{y}_i)^2). Gives closed‑form (\beta = (X^\top X)^{-1}X^\top y) when (X^\top X) is invertible.
Polynomial Regression – Extends linear model with powers of features: (\hat{y}= \beta_0 + \beta_1 x + \beta_2 x^2 + \dots + \beta_d x^d). Captures curvature while still fitting with OLS.
L1 Regularization (Lasso) – Penalty: (\displaystyle \lambda \sum_{j=1}^{p} |\beta_j|). Drives some coefficients exactly to 0 → built‑in feature selection.
L2 Regularization (Ridge) – Penalty: (\displaystyle \lambda \sum_{j=1}^{p} \beta_j^2). Shrinks coefficients toward 0 but never eliminates them; reduces variance.
Elastic Net – Combination: (\displaystyle \lambda_1\sum |\beta_j| + \lambda_2\sum \beta_j^2). Balances sparsity (L1) and stability (L2).
Mean Squared Error (MSE) – (\displaystyle \text{MSE}= \frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{y}_i)^2). Primary loss for regression; lower = better.
Root Mean Squared Error (RMSE) – (\displaystyle \text{RMSE}= \sqrt{\text{MSE}}). Same units as (y); easier to interpret.
R‑squared (R²) – (\displaystyle R^2 = 1 - \frac{\sum (y_i-\hat{y}_i)^2}{\sum (y_i-\bar{y})^2}). Proportion of variance explained; 0 → no fit, 1 → perfect fit.
Train‑Test Split – Typical split: 70‑80 % train, 20‑30 % test (or use train_test_split(..., stratify=y) for time‑series cross‑validation).
Cross‑Validation (k‑fold) – Repeatedly train on (k-1) folds, validate on the held‑out fold; average metric gives a more robust estimate of generalization error.

Step‑by‑Step / Process Flow

Load & Inspect
python import pandas as pd df = pd.read_csv('house_prices.csv') df.head(); df.describe()
Clean & Engineer – Handle missing values, encode categoricals, create interaction/polynomial features (PolynomialFeatures), and scale numeric columns (StandardScaler).
Split –
python from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)
Baseline Model – Fit ordinary least‑squares linear regression.
python from sklearn.linear_model import LinearRegression lin = LinearRegression().fit(X_train, y_train)
Evaluate – Compute MSE, RMSE, R² on the hold‑out set.
python from sklearn.metrics import mean_squared_error, r2_score preds = lin.predict(X_test) mse = mean_squared_error(y_test, preds) rmse = mse0.5 r2 = r2_score(y_test, preds)
Regularize & Tune – Use RidgeCV / LassoCV / ElasticNetCV to search over (\lambda) (or alpha) with cross‑validation, then re‑evaluate.
python from sklearn.linear_model import RidgeCV ridge = RidgeCV(alphas=[0.1, 1.0, 10.0], cv=5).fit(X_train, y_train)

Common Mistakes

Mistake	Correction
Using MSE on a highly skewed target – large errors dominate, hiding systematic bias.	Transform the target (log, Box‑Cox) or report MAE alongside RMSE to capture median error.
Fitting a high‑degree polynomial without regularization – overfits training data, terrible test performance.	Apply Ridge/Lasso or limit degree; use cross‑validation to pick the sweet spot.
Scaling only after train‑test split – data leakage because the scaler sees test data.	Fit the scaler on the training set (`scaler.fit(X_train)`) and apply the same transformation to both train and test (`scaler.transform`).
Ignoring multicollinearity – OLS coefficients become unstable when features are highly correlated.	Detect with VIF; drop/reduce correlated columns or switch to Ridge (which handles collinearity).
Evaluating on the same data used for hyper‑parameter search – optimistic bias.	Reserve a final hold‑out set or use nested cross‑validation for model selection and evaluation.

Data Science Interview / Practical Insights

“When would you prefer Lasso over Ridge?” – Lasso when you need a sparse model (automatic feature selection) and the number of predictors exceeds the number of observations.
“Explain why R² can be negative.” – If the model’s MSE is larger than the variance of the baseline (predicting the mean), the numerator exceeds the denominator, yielding a negative R²—signaling a worse‑than‑naïve model.
“How does polynomial regression differ from adding interaction terms manually?” – Polynomial features automatically generate all powers up to the specified degree, including cross‑terms; manual interaction may miss higher‑order combos.
“What’s the effect of the regularization strength λ on bias‑variance?” – Larger λ increases bias (under‑fitting) but reduces variance (over‑fitting); the sweet spot is found via CV.

Quick Check Questions

Scenario: Your model’s training RMSE is 5, but test RMSE is 20.
Answer: The model is over‑fitting; increase regularization (e.g., raise λ in Ridge/Lasso) or reduce model complexity.
Scenario: You have 10,000 features but only 200 samples.
Answer: Use Lasso (or Elastic Net) to enforce sparsity, or first perform dimensionality reduction (PCA) before regression.
Scenario: After adding a quadratic term, R² improves from 0.70 to 0.71, but RMSE barely changes.
Answer: The extra term adds little predictive power; the small R² gain may be noise—prefer the simpler model to avoid unnecessary complexity.

Last‑Minute Cram Sheet (10 one‑liners)

OLS closed‑form: (\beta = (X^\top X)^{-1}X^\top y).
Ridge loss: (\text{MSE} + \lambda|\beta|_2^2); Lasso loss: (\text{MSE} + \lambda|\beta|_1).
RMSE = √MSE – same units as the target, easier to communicate to stakeholders.
R² = 1 – (RSS/TSS); negative R² ⇒ model worse than predicting the mean.
PolynomialFeatures(degree=d, include_bias=False) creates all combos up to (d).
Cross‑validation (k‑fold) reduces variance of the performance estimate compared to a single train‑test split.
StandardScaler subtracts mean, divides by std; ⚠️ assumes roughly Gaussian features—use MinMaxScaler for bounded, skewed data.
Elastic Net α = λ₁ + λ₂, l1_ratio = λ₁/(λ₁+λ₂) in scikit‑learn.
VIF > 5 signals problematic multicollinearity; consider dropping or regularizing.
Bias‑variance trade‑off: ↑λ → ↑bias, ↓variance; ↓λ → ↓bias, ↑variance.

Keep this guide handy; you now have the core theory, the practical workflow, and the interview‑ready nuggets to own any regression‑focused data‑science task. Happy modeling!

⚡ Recently practiced quizzes in this class

Data Analytics Practice Test Big Data & Analytics NASSCOM Certification Practice Test PySpark Practice Test Questions Basic Data Analytics and Visualization Practice Test (Tableau) Data Science Glossary Data Analysis with Python Data Science Exam #1 Data Analytics and Visualization Practice Test Pega Certified System Architect (PCSA) Study Guide Data Science Basics / Data Scientist Toolbox

➡️ Next Study Guide

Data Science and Machine Learning 101: Machine Learning Core Supervised Learning Regression Linear Polynomial Regularization Evaluation MSE RMSE R²

What This Is

Key Terms & Formulas

Step‑by‑Step / Process Flow

Common Mistakes

Data Science Interview / Practical Insights

Quick Check Questions

Last‑Minute Cram Sheet (10 one‑liners)

❤ If you liked Fatskills, consider supporting us by checking out The Life Manuals You Never Got.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | OSHA Basics Quiz | What Should We Know?
Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson
© 2026 Fatskills.com

All trademarks, logos and brand names are the property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.

Data Science and Machine Learning 101: Machine Learning Core Supervised Learning Regression Linear Polynomial Regularization Evaluation MSE RMSE R²

What This Is

Key Terms & Formulas

Step‑by‑Step / Process Flow

Common Mistakes

Data Science Interview / Practical Insights

Quick Check Questions

Last‑Minute Cram Sheet (10 one‑liners)

❤ If you liked Fatskills, consider supporting us by checking out The Life Manuals You Never Got.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | OSHA Basics Quiz | What Should We Know? Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson© 2026 Fatskills.com

All trademarks, logos and brand names are the property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.

About | Explore | User Guide | Topics | Subjects | Doubt Solver | Career Aptitude Test | Answers | Free Tools | OSHA Basics Quiz | What Should We Know?
Privacy | Terms |

Without work one finishes nothing. - Ralph Waldo Emerson
© 2026 Fatskills.com