Machine Learning - Linear Regression with Multiple Variables

Multiple Features

When more than one feature affects the output
In the example of estimating house price, size, # of bedrooms, # of floors may affect the price
features are denoted with x₁, x₂
- n = # of features
- x⁽ⁱ⁾ = input of i^th training example : vector
- x_j⁽ⁱ⁾ = value of feature j in i^th training example : the value
Since there are more than one features, the hypothesis is going to get longer in order to include more terms/features
- theta represents parameter: I think this is like weight
- x₀ is assumed to be 1
- x and theta are both (n+1) dimensional vectors
- h_θ(x) = θ^Tx

Pretty much the same as single variable, but you gotta multiply x_j⁽ⁱ⁾ for each theta

Making features on a similar scale
- if the ranges are different between features, the contour plot could be skewed
- divide the features by the largest possible value to make the contour plot to look like a circle as much as possible
- because all the features lay within -1 to 1 range
Mean Normalization
- subtract the feature by the mean of the values and divide that by the largest value
- x_i = (x_i - μ_i) / largest value
- you could also divide the numerator by the difference between the smallest and the largest

Tips to ensure learning rate is working correctly
- make sure that the cost function is decreasing as you iterate
- J(θ) must decrease after every iteration
- declare convergence if J(θ) decreases by less than 10^-3 in one iteration
Use smaller learning rate if not converging
- a too large learning rate may overshoot and it may diverge instead of converging
- a sufficiently small learning rate is the best
- but if it is too small, it will take too long to converge

combine two features like frontage and depth and multiply them to be an area
- define a new feature by combining them to reduce the number of parameters/features
- your cost function may become simpler

Method to solve for θ analytically instead of doing iterations
- polynomial of θ, and take a derivative with respect to θ
- take partial derivatives with respect to θ and set them to zero to find the θ_i
θ = (X^TX)^-1X^Ty
Slow if n is really large becuase you need to take inverse of the matrix X
won’t work if (X^TX) is not inversible
- you have to delete some features or use regularization

Oct 30, 2020

AI Enthusiast and a Software EngineerJason Kang on LinkedIn