# Import the necessary libraries
import numpy as np
import pandas as pd
# Generate random data
n = 100
x = np.random.rand(n)
y = 2*x + np.random.normal(size=n)
# Create a DataFrame and save to CSV
df = pd.DataFrame({'x': x, 'y': y})
df.to_csv('your_dataset.csv', index=False)OLS regression
OLS regression is a method for estimating the parameters of a linear regression model. The goal is to find the line that best fits a set of data points. The line is represented by an equation of the form y = mx + b
where :
- y is the dependent variable,
- x is the independent variable,
- m is the slope of the line, and
- b is the y-intercept.
Generate random data
import numpy as np
from numpy import ndarray
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
# Step 1: Load the data and split into independent and dependent variables
data = pd.read_csv('your_dataset.csv')
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
# add a column of 1s to the X matrix for the intercept term
X = np.append(arr=np.ones((len(X), 1)), values=X, axis=1)
# calculate the coefficients using the OLS formula
beta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
y_pred = X.dot(beta)def mean_squared_error(y_true:ndarray, y_pred:ndarray):
    n = len(y_true)
    mse = sum([(y_true[i] - y_pred[i])**2 for i in range(n)]) / n
    return mse
def r2_score(y_true:ndarray, y_pred:ndarray):
    ssr = sum([(y_true[i] - y_pred[i])**2 for i in range(len(y_true))])
    sst = sum([(y_true[i] - np.mean(y_true))**2 for i in range(len(y_true))])
    r2 = 1 - (ssr / sst)
    return r2
rmse = np.sqrt(mean_squared_error(y, y_pred))
r2 = r2_score(y, y_pred)
print("RMSE: ", rmse)
print("R-squared: ", r2)RMSE:  0.9979255341660089
R-squared:  0.25357665389919926Citation
BibTeX citation:
@online{bochman2023,
  author = {Bochman, Oren},
  title = {OLS Regression {From} {Scratch}},
  date = {2023-02-01},
  url = {https://orenbochman.github.io/posts/2023/2023-02-01-ds-from-scratch/ols-regression-from-scratch.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2023. “OLS Regression From Scratch.”
February 1, 2023. https://orenbochman.github.io/posts/2023/2023-02-01-ds-from-scratch/ols-regression-from-scratch.html.
