Machine Learning - Logistic Regression

Logistic Regression

yes or no question where outputs are discrete
- 0: negative class (benign tumor)
- 1:positive class (malignant tumor)
- there could be multi-class classifiction where there are more than two possible outputs
You could use this: h_θ(x) = θ^Tx
- if h_θ(x) = θ^Tx > 0.5, y = 1 (0.5 is the threshold)
- if h_θ(x) = θ^Tx < 0.5, y = 0
- but what happens if the input range increases? : if the threshold remains the same, some cases that could be benign are now considered to be malignant
- but in some cases, y could be greater than 1 or smaller than 0
- logistic regression ensures that 0 < h_θ(x) < 1

a function that represents hypothesis that satifies 0 < h_θ(x) < 1
h_θ(x) =g(θ^Tx) where g(z) = 1 / (1 + e^-z)
h_θ(x) = 1 / (1 + -e^{θ^Tx})
- sigmoid function / logistic function
- asymptote at 0 and 1
- ensures that 0 < h_θ(x) < 1
h_θ(x) = estimated probability that y = 1
- h_θ(x) = P(y=1 | x;θ)
- P(y=0 | x;θ) + P(y=1 | x;θ) = 1
- P(y=0 | x;θ) = 1 - P(y=1 | x;θ)

Linear Regression:
- J(θ) = Cost(h_θx, y) =(1/2)(h_θx - y)²
For Logistic Regression:
- the cost function ends up non-convex if square is used
- many local optima may appear
- log(h_θx) if y = 1
- -log(1-h_θx) if y = 0

Conjugate gradient, BFGS, L-BFGS
- no need to manually pick a learning rate
- often faster than gradient descent, but more complex

underfitting
- does not fit the training set very well
- also called high bias
overfitting
- graph looks weird to best fit the data
- also called high variance
How to solve the overfitting problem
- reduce number of features
- model selection algorithm
- regularization: keep all the features but reduce the magnitudes of parameters

Small values for parameters
- simpler hypothesis
- less prone to overfitting
- add a lambda to deal with parameters
Regularization term which includes lambda
- too large of a lambda results in underfitting

Nov 3, 2020

AI Enthusiast and a Software EngineerJason Kang on LinkedIn