Machine Learning  Linear Regression with Multiple Variables
Multiple Features
 When more than one feature affects the output
 In the example of estimating house price, size, # of bedrooms, # of floors may affect the price

features are denoted with x_{1}, x_{2}
 n = # of features
 x^{(i)} = input of i^{th} training example : vector
 x_{j}^{(i)} = value of feature j in i^{th} training example : the value

Since there are more than one features, the hypothesis is going to get longer in order to include more terms/features
 theta represents parameter: I think this is like weight
 x_{0} is assumed to be 1
 x and theta are both (n+1) dimensional vectors
 h_{θ}(x) = θ^{T}x
Gradient Descent for Multiple Variables
 Pretty much the same as single variable, but you gotta multiply x_{j}^{(i)} for each theta
Feature Scaling

Making features on a similar scale
 if the ranges are different between features, the contour plot could be skewed
 divide the features by the largest possible value to make the contour plot to look like a circle as much as possible
 because all the features lay within 1 to 1 range

Mean Normalization
 subtract the feature by the mean of the values and divide that by the largest value
 x_{i} = (x_{i}  μ_{i}) / largest value
 you could also divide the numerator by the difference between the smallest and the largest
Learning Rate

Tips to ensure learning rate is working correctly
 make sure that the cost function is decreasing as you iterate
 J(θ) must decrease after every iteration
 declare convergence if J(θ) decreases by less than 10^{3} in one iteration

Use smaller learning rate if not converging
 a too large learning rate may overshoot and it may diverge instead of converging
 a sufficiently small learning rate is the best
 but if it is too small, it will take too long to converge
Features and Polynomial Regression

combine two features like frontage and depth and multiply them to be an area
 define a new feature by combining them to reduce the number of parameters/features
 your cost function may become simpler
Normal Equation

Method to solve for θ analytically instead of doing iterations
 polynomial of θ, and take a derivative with respect to θ
 take partial derivatives with respect to θ and set them to zero to find the θ_{i}
 θ = (X^{T}X)^{1}X^{T}y
 Slow if n is really large becuase you need to take inverse of the matrix X

won’t work if (X^{T}X) is not inversible
 you have to delete some features or use regularization