All Articles

Machine Learning - Introduction

Machine Learning - Introduction


  • Teaching a machine to learn its task without being explicitly programmed

    • The machine is practicing/learning on its own to improve its performance.

Supervised Learning

  1. right answers are given for a data set

    • so do your best to produce a right answer for another input
  2. Types of supervised learning:

    • Regression: Predict continuous valued output

      • something like house prices.
    • Classification: Discrete answers

      • either 0 or 1 : true or false
      • could be also 0, 1, 2, 3
      • what is the probability that the input turns out to be true or false?
      • if you need to consider more than one parameter, the graph may look different

Unsupervised Learning

  1. No right answers given

    • no labeling for a data.
    • don’t know if it’s true or false
    • but there are some sorts of clusters: you could organize them
  2. something like organizing news.

    • not given how each article is related, but Google somehow organizes them by headline
    • you have to find the relationship between/among the given data set

Model Representation

  1. Notation

    • training set: a dataset given to train a model
    • m : # of training examples
    • x: input variables/features
    • y: output variable/ target variable
    • (xi, yi) = ith training example (ith row from the training set table)
  2. Training set -> Fed into learning algorithm -> hypothesis(h)

    • hypothesis takes an input and produce an output
    • h maps from x’s to y’s
    • how to represent h ?
    • hθ(x) = h(x) = θ0 + θ1x (for a linear function, think of it like a f(x))

Cost Function

  1. Measures the performance of a machine learning model: the goal is to find thetas that minimize the cost function

    • θ0 and θ1 are parameters
    • x1 and x2 are features
    • choose the best θ0 and θ1 to make it close to y as much as possible
    • so minimize (1/2m) * SIGMA ((hθ(x)-y)2)(sum of these is the cost function I think)
  2. Finding θ1 which minimizes J(θ) when θ0 = 0

    • this is where differential equation comes in.
    • minimum when differential = 0
  3. Contour plots are used to indicate cost functions

Gradient Descent

  1. A way to minimize the cost function J
  2. Start at some random θ0 and θ1 and keep changing them simultaneously in order to reduce J(θ0, θ1)

    • θj := θj -α * derivativewithrespect_to*θj(J(θ0, θ1))
    • α is called learning rate
    • temp0 := θ0 equation
    • temp1 := θ1 equation
    • θ0 := temp0
    • θ1 := temp1
    • same theta’s have to be used in order to calculate a temp value
  3. Intuition

    • repeat until convergence: until you reach the global minimum
    • think about the equation: θj := θj -α * derivativewithrespect_to*θj(J(θ0, θ1))
    • if the derative is positive, you know that your entire term is getting smaller, and if negative, its getting bigger
    • if the learning rate is too small, gradient descent can be slow. because you would need more iteration
    • if the learning rate is too large, gradient descent can overshoot the minimum -> could even diverge
    • you don’t need to decrease the learning rate over time because the derivate term will decrease as you approach the minimum
  4. Batch

    • Each step of gradient descent uses all the training examples
    • you are computing all the sums to take the next step in gradient descent
Loading script...