# Logistic Regression as a Neural Network
# Notation
A single training example. Where( is a x-dimensional feature vector), . or
A training set which contains m training examples.
A test example set.
Input training example matrix which hasrows and columns. . WARNING
It is not a good idea to put training examples
as row vectors in the matrix . It will cause much more efforts when computing.
Output matrix which hasrow and columns. .
# Binary Classification
Logistic regression is an algorithm for binary classification (Used for the supervised learning problem which the output labels
Scenario example
Take a picture as an input, and want to get a label for output to infer whether the picture is a cat. Define the output label 1
as the picture is the case of a cat picture, and the output label is 0
as the picture is the case of not a cat picture.
To Be Specific
Say, the picture is 64 × 64 pixels. And it can be divided as 3 matrices representing the red, green and blue channel.
Define the feature vector
Goal: Classifier 1
or 0
.
# Logistic Regression
Example
Given a feature vector
Parameters
*
Output
About Sigmoid Function
The graph of sigmoid function
Features
# Logistic Regression Cost Function
Using a cost function to train the parameters
# Loss (Error) Function
A function used to measure how well the algorithm is doing. (For single training example)
Square Error
Not usually use. The optimization problem always becomes non-convex. Therefore, there will be more than one multiple local optima. It can not using for gradient descent to find the global optimum.Cross-Entropy Loss Function
The lower value of loss function is, the better prediction of algorithm is.[Analyse]
- If
, .
Whenis small, should be large, and should be large as well but no more than . - If
, .
Whenis small, should be large, and should be small but no less than .
- If
# Cost Function
A function used to measure how well the algorithm is doing. (For the whole training set)
Still, the lower value of cost function is, the better prediction of algorithm is.
# Gradient Descent
A algorithm to train (learn) the parameters
[Analyse]
Only analyse the value
*
*When writing code, dw
.
And for the real cost function, the gradient descent will be like:
*When writing code, dw
, and db
.
*Also, a coding convention dvar
represent the derivative of a final output variable with respect to various intermediate quantities
# Logistic Regression Gradient Descent
[Analyse]
For a training example which has two features.
features:
parameters:
da
:
dz
:
dw1
:
dw2
:
db
:
# Gradient Descent on m Examples Training Set
[Analyse]
For a m training examples training set, and each training example have two features.
features:
parameters:
Overall training set gradient descent with the respect of
For single training example
// Initialize
J = 0;
dw1 = 0; // as a accumulator for whole training set
dw2 = 0; // as a accumulator for whole training set
db = 0; // as a accumulator for whole training set
// Add up
for(i = 1 to m) {
z(i) = wT x(i) + b;
a(i) = σ(z(i));
J += -[y(i) log(a(i)) + (1 - y(i)) log(1 - a(i))];
dz(i) = a(i) - y(i);
dw1 += x1(i) dz(i);
dw2 += x2(i) dz(i);
db += dz(i);
}
// Get average
J /= m;
dw1 /= m; // geting the value of dJ/dw1
dw2 /= m; // geting the value of dJ/dw2
db /= m; // geting the value of dJ/db
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
TIP
In the algorithm, there are two for-loop nested (The second for-loop used for calculate every