1. $X$ is a matrix which has m rows and n columns, that means it is a $m \times n$ matrix, represents for training set.
  2. $\theta$ is a $1 \times n$ vector, stands for hypothesis parameter.
  3. $y$ is a $m \times 1$ vector, stands for real value of training set.
  4. $\alpha$ named learning rate for defining learning or descending speed.

1. Hypothesis

Draw hypothesis of a pattern.
Since classification problem range from 0 to 1
We need to make use of this sigmoid function

2. Cost

Calculate the Cost for single training point.

3. Cost function

Draw cost function for iterating whole training set.

4. Get optimized parameter

Learn from training set to get optimized parameter for proposed algorithm.

Gradient Descend


  1. Conjugate gradient
  2. BFGS
  3. L-BFGS


17 July 2014