Here are my solutions to
exercise 4 .
Implementing Our Network to Classify Digits
Part 1
Question
Write out a ′ = σ ( w a + b ) a'=\sigma(wa+b) a ′ = σ ( w a + b ) in component form, and verify that it gives the same result as the rule,
1 1 + exp ( − ∑ j w j x j − b ) \frac{1}{1+\exp(- \sum_j w_j x_j - b)} 1 + e x p ( − ∑ j w j x j − b ) 1 , for computing the output of a sigmoid neuron.
Solution
Let it be stated that I have not yet taken linear algebra at college, so I have very limited
experience with it.
Let us say that layer 2 has 2 2 2 nodes and layer 1 has 3 3 3 nodes.
The weights from layer 1 to layer 2 can be expressed as the following (w j i w_{ji} w ji , where j j j is the
neuron in the second layer and i i i is the neuron in the first layer):
w = [ w 11 w 12 w 13 w 21 w 22 w 23 ] a = [ a 1 a 2 a 3 ] b = [ b 1 b 2 ] w = \begin{bmatrix}
w_{11} & w_{12} & w_{13}\\
w_{21} & w_{22} & w_{23}
\end{bmatrix}
a = \begin{bmatrix}
a_1\\
a_2\\
a_3
\end{bmatrix}
b = \begin{bmatrix}
b_1\\
b_2
\end{bmatrix} w = [ w 11 w 21 w 12 w 22 w 13 w 23 ] a = a 1 a 2 a 3 b = [ b 1 b 2 ]
w a = [ w 11 a 1 + w 12 a 2 + w 13 a 3 w 21 a 1 + w 22 a 2 + w 23 a 3 ] w a + b = [ ( w 11 a 1 + w 12 a 2 + w 13 a 3 ) + b 1 ( w 21 a 1 + w 22 a 2 + w 23 a 3 ) + b 2 ] a ′ = σ ( w a + b ) = [ σ ( ( w 11 a 1 + w 12 a 2 + w 13 a 3 ) + b 1 ) σ ( ( w 21 a 1 + w 22 a 2 + w 23 a 3 ) + b 2 ) ] \begin{gathered}
wa = \begin{bmatrix}
w_{11} a_1 + w_{12} a_2 + w_{13} a_3\\
w_{21} a_1 + w_{22} a_2 + w_{23} a_3
\end{bmatrix}\\
wa + b = \begin{bmatrix}
(w_{11} a_1 + w_{12} a_2 + w_{13} a_3) + b_1\\
(w_{21} a_1 + w_{22} a_2 + w_{23} a_3) + b_2
\end{bmatrix}\\
a' = \sigma(wa + b) = \begin{bmatrix}
\sigma((w_{11} a_1 + w_{12} a_2 + w_{13} a_3) + b_1)\\
\sigma((w_{21} a_1 + w_{22} a_2 + w_{23} a_3) + b_2)
\end{bmatrix}
\end{gathered} w a = [ w 11 a 1 + w 12 a 2 + w 13 a 3 w 21 a 1 + w 22 a 2 + w 23 a 3 ] w a + b = [ ( w 11 a 1 + w 12 a 2 + w 13 a 3 ) + b 1 ( w 21 a 1 + w 22 a 2 + w 23 a 3 ) + b 2 ] a ′ = σ ( w a + b ) = [ σ (( w 11 a 1 + w 12 a 2 + w 13 a 3 ) + b 1 ) σ (( w 21 a 1 + w 22 a 2 + w 23 a 3 ) + b 2 ) ]
This is exactly the same as 1 1 + exp ( − ∑ j w j x j − b ) \frac{1}{1+\exp(- \sum_j w_j x_j - b)} 1 + e x p ( − ∑ j w j x j − b ) 1 but computed for both neurons
at once with matrices!
Say we wanted to compute the output of the first sigmoid neuron in layer 2.
a 1 ′ = σ ( ∑ j w j a j + b ) = σ ( ( w 1 a 1 + w 2 a 2 + w 3 a 3 ) + b ) a_1' = \sigma(\sum_j w_j a_j + b) = \sigma((w_1 a_1 + w_2 a_2 + w_3 a_3) + b) a 1 ′ = σ ( j ∑ w j a j + b ) = σ (( w 1 a 1 + w 2 a 2 + w 3 a 3 ) + b )
Also, I wanted to note that I found this website called the
ml cheatsheet and it has been
really useful in describing the mathematic concepts.
The header image was taken from
Khan Academy .