1. $X$ is a matrix which has m rows and n columns, that means it is a $m \times n$ matrix, represents for training set.
  2. $\theta$ is a $1 \times n$ vector, stands for hypothesis parameter.
  3. $y$ is a $m \times 1$ vector, stands for real value of training set.
  4. $\alpha$ named learning rate for defining learning or descending speed.
  5. $S(X_j)$ means to get standard deviation of the j feature from training set.

1. Hypothesis

Draw hypothesis of a pattern.

2. Cost

Calculate the Cost for single training point.

3. Cost function

Draw cost function for iterating whole training set.

4. Get optimized parameter

Learn from training set to get optimized parameter for proposed algorithm.

Gradient Descend

Complicate to implement.
suitable for any senario.

Normal equation

Convenient, but performance bad while m grow large than 100000.
Unable to conquer non-invertable matrix.

Feature scaling

Use feature scaling to optimize training set.
Make gradient descend converge much faster.


15 July 2014