Questions tagged [neural-networks]

For questions about the mathematics of artificial neural networks: their underlying multilayered graph object or their use as a data structure in machine learning algorithms. Consider also using the tags (machine-learning) or (graph-theory).

Neural networks (or commonly called artificial neural networks, or ANNs) are utilized in the fields of computer science and engineering in machine learning or deep learning algorithms. Technically the term neural network refers to the entire data structure, but because of their ubiquity the term neural network is often used to refer to the underlying weighted graph object too. Sometimes in papers you see it called a multilayer graph instead (a special case of a multipartite graph).

Artificial Neural Network
Image sourced from extremetech.com

Essentially a neural network is a "chain" of complete bipartite graphs. The first layer of nodes in the chain is the input layer, the last is the output layer, and all the other nodes are in hidden layers. The big idea here is that a neural network is designed to simulate the way the human brain (a real neural networks) recognizes patterns. Because of this, neural networks are typically used in many pattern-recognition algorithms, like general handwriting recognition (when you can deposit a check by taking a picture of it), or facial recognition (when Facebook asks you to tag friends in a picture because it recognizes their face).


The common example used when explaining applications of ANNs is handwriting recognition. Adam Harley of Ryerson University created a beautiful online visualization of this example. Suppose someone writes a digit in the range [0, 1, ..., 9] and your ANN was trained to recognize what that digit is. First you codify the person's handwritten digit (make it into a black-and-white image and for each pixel use a zero if the pixel is black and a one if it's white, or something like that). This feeds into the input layer as the weights of those nodes. Then these initial input values are propagated through the ANN, being "operated on" by the weights of the edges and the nodes in the hidden layers (lots of details glossed over here). Then when the values arrive in the output layer, that output is compared to the set of expected output values to decide what the handwritten digit was. Since we have ten expected output values, our output layer would probably be set up with ten nodes, and we'd say, for example, that a value of (0,0,0,1,0,0,0,0,0,0) would be the expected output if the digit was three.

And of course how accurate this ANN is depends completely on the weights in the hidden layers. Prior to having a working ANN, you have to train it. To do this you get a large collection of input images (handwritten digits) for which you know what the output they are supposed to be. Then you feed the input through the ANN, look at the output and compare it to your expected output for that input. Using the differences between the output and the expected output, you back-propagate through the ANN and alter the weights to values that would have given a more accurate result. Changing these weights over a large collection of sample inputs essential teaches the network to accurately recognize patterns.

836 questions
5
votes
1 answer

Why is modeling the joint distribution between many continuous random variables, obtains generalization more easily?

In the paper "A Neural Probabilistic Language Model", by Toshua Bengio et Al., there is the following paragraph: A fundamental problem that makes language modeling and other learning problems difficult is the curse of dimensionality. It is…
3
votes
0 answers

Understanding the equations of Backpropagation

I was going through the equations of Backpropagation in Andrew Ng's Deep Learning course and I got these set of equations for a two layer Neural Network: $dZ^{[2]} = A^{[2]} - y$ $dW^{[2]} = 1 / m \space\space dZ^{[2]}\space A^{[1]T}$ $dZ^{[1]} =…
Ashu
  • 31
3
votes
0 answers

Need help understanding LSTMs' backpropagation and carousel of error

I'm having a hard time trying to derive the maths behind LSTMs and vanishing gradients. I had a of help from LSTM forward and backward pass, but I got stuck in page 11 from LSTM forward and backward pass. Given the image: We can form system of…
2
votes
2 answers

In Wikipedia's Statement of The Universal Approximation Theorem, is it taking identity activation on the output layer with no bias?

See the Universal Approximation Theorem (arbitrary width) on wikipedia or below. The universal approximation theorem (arbitrary width) is talking about a neural network with 1 hidden layer (input, hidden, output). In the case of a 3 layers network…
Mark
  • 1,339
2
votes
1 answer

Neural networks backpropagation

I was reading chapter 2 of Michael Nielsen's book on deep learning: Shouldn't the circle word read "forward" instead of "backward" because we are using the weighted inptut to layer l?
st4rgut
  • 123
2
votes
1 answer

Minimal neural network to learn quadratic function

It is common knowledge that, for any function $f: D \subset \mathbb{R} \to \mathbb{R}$, there is a regular neural network capable of approximating $f$ with an arbitrarily small error. Finding that network is a different story, though. My question…
David
  • 3,029
2
votes
1 answer

Backpropagation in recurrent neural networks

How do recurrent neural networks share weights ? I have been reading it online but I cant figure out how it does this. Particularly because during backpropagation,the hidden cell at e.g. t=2 would receive the gradients coming from t=3. So both now…
Kong
  • 884
2
votes
1 answer

derivative of cost function for Neural Network classifier

I am following Andrew NG's Machine Learning course on Coursera. The cost function without regularization used in the Neural network course is: $J(\theta) = \frac{1}{m} \sum ^{m}_{i=1}\sum ^{K}_{k=1} [-y_{k}^{(i)}log((h_{\theta}(x^{(i)}))_{k})…
Roland
  • 37
2
votes
0 answers

neural network resolving problem

I'm noticing something strange, i followed an example of coding a fully connected neural network in 3 layers. it uses backpropagation, and it works great. For example using Sweep optimization (ea sweeping trough starting train variables) this…
Peter
  • 131
2
votes
0 answers

Jacobian matrix through backpropagation

I know that backpropagation algorithm is useful to computing gradient of Error function (a scalar function) of a neural network respect to its weights. In a paper I read that the backward pass of a neural network could be used to compute the…
aleio1
  • 995
2
votes
1 answer

Find the interval over which this function is greater than 1

Consider the product $\left|w\sigma'(wa+b)\right|$, where $\sigma(x) = \frac{1}{1+e^{-x}}$. Suppose that $|w| \geq 4$, and $|w\sigma'(wa+b)|\geq 1$. Show that the set of $a$ satisfying that constraint can range over an interval no greater in width…
b_pcakes
  • 1,501
1
vote
0 answers

What is the mathematical relationship among hyper-parameters alpha, beta, and tau in Alpha neuron of Spiking neural network?

I am recently reading the tutorial on spiking neural networks about Alpha neurons written by Jason K. Eshraghian. Here's the link to his tutorial. According to his tutorial, the direct implementation is given as: $U_{\rm mem}(t) = \sum_i W(\epsilon…
Lucas
  • 11
1
vote
1 answer

I am trying to understand this derivation

This might be a bit basic but, can someone tell me how did the authors go from the first equation to the next. Each case in the transfer set contributes a cross-entropy gradient, $dC/dz_i$, with respect to each logit, $z_i$ of the distilled model.…
1
vote
1 answer

multiplication with neural nets

I have to functions $f(x)$ and $g(x)$ and each of them i can realize with a neural net $\phi_f$ and $\phi_g$. My question is, how can i write a neural net for $f(x)g(x)$ ? so for example if g(x) is constant and equal to c and $\phi_f =…
emily20
  • 155
1
vote
0 answers

Feedforward neural networks: how to obtain the gradient of the loss function with respect to the weight matrix of the second to last layer

I have recently run into this page. I am trying to obtain the equation labeled BP4 by the author. Particularly, I want to obtain BP4 for the case of a feedforward neural network consisting of three layers —i.e. input layer, one single hidden layer…
Werther
  • 85
1
2 3 4