Notes for Machine Learning - Week 5

2016-08-17

Tutorials

2083 words 5 mins read

Neural Networks: Learning Cost Function and Backpropagation Cost Function Let’s first define a few variables that we will need to use: $L$ = total number of layers in the network $s_l$ = number of units (not counting bias unit) in layer $l$ $K$ = number of output units/classes Recall that the cost function for regularized logistic regression was: $J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$ For neural networks, it is going to be slightly more complicated: $J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k)

Notes for Machine Learning - Week 4

2016-08-15

Tutorials

1549 words 4 mins read

Neural Networks: Representation Motivations Non-linear Hypotheses Performing linear regression with a complex set of data with many features is very unwieldy. For 100 features, if we wanted to make them quadratic we would get 5050 resulting new features. We can approximate the growth of the number of new features we get with all quadratic terms with $\mathcal{O}(n^2/2)$. And if you wanted to include all cubic terms in your hypothesis, the features would grow asymptotically at $\mathcal{O}(n^3)$. These are very steep growths, so as the number of our features increase, the number of quadratic or cubic features increase very rapidly and

Notes for Machine Learning - Week 3

2016-08-05

Tutorials

1843 words 4 mins read

Logistic Regression Classification and Representation Classification Calssification Problem $y\in {0,1}$ 0: “Negative Class”, 负类 1: “Positive Class”, 正类 One method is to use linear regression and map all predictions greater than 0.5 as a 1 and all less than 0.5 as a 0. This method doesn’t work well because classification is not actually a linear function. Logistic Regression (逻辑回归) : $0\le h_\theta \le 1$ Hypothesis Representation Logistic Regression Model $h_\theta (x) = \frac{1}{1+e^{-\theta ^T x}}$ Want $0\le h_\theta(x)\le 1$ $h_\theta (x) = g(\theta ^T x)$ $g(z) = \frac{1}{1+e^{-z}}$ Called

Notes for Machine Learning - Week 2

2016-07-27

Tutorials

2017 words 5 mins read

Linear Regression with Multiple Variables Multivariate Linear Regression Multiple features (variables) $n$ = number of features $x^{(i)}$ = input (features) of $i^{th}$ training example. $x^{(i)}_j$ = value of feature $j$ in $i^{th}$ training example. Hypotesis Previously: $h_\theta (x) = \theta_0 + \theta_1 x$ $h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n$ For convenience of notation, define $x_0=1$ $x=\begin{bmatrix}x_0 \\ x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}, \theta = \begin{bmatrix}\theta_0 \\ \theta_1 \\ \theta_2 \\ \vdots \\ \theta_n \end{bmatrix}, h_\theta (x) = \theta^T x$ Gradient Descent for Multiple Variables Hypothesis: $h_\theta(x)=\theta^Tx=\theta_0

Notes for Machine Learning - Week 1

2016-07-26

Tutorials

610 words 2 mins read

Linear Regression with One Variable Model and Cost Function Model Representation Supervised Learning (监督学习): Given the “right answer” for each example in the data. Regression Problem (回归问题): Predict real-valued output. Classification Problem (分类问题): Predict discrete-valued output. Training set (训练集) m: number of training examples x’s: “input” variable / features y’s: “output” variable / “target” variable $(x, y)$: one training example $(x^i, y^i)$: $i^{th}$ training example Training Set -> Learning Algorithm -> h(hypothesis, 假设) h is a