Neural Network Glossary

Alex Sambvani
3 min readMar 19, 2018


Here is a running list I will update periodically with key terms for neural networks / deep learning. I attempt to explain each concept in somewhat less technical terms than existing resources do.

  • Neuron. A Neuron is an information processing unit in a neural network. Each neuron processes some input by applying an “Activation Function” (defined below) and serves the result of the activation function as its output.
  • Perceptron. A perceptron is a neuron that takes binary inputs and produces a single binary output.
  • Activation Function. The function that we pass the input information through in a neuron.
  • Sigmoid. A common activation function. S-shaped curve that ranges between 0 and 1. Neurons are sometimes referred to as “sigmoid neurons,” meaning they are neurons that use the sigmoid activation function.
  • Tanh. A common activation function. S-shaped curve that ranges between -1 and 1. Neurons are sometimes referred to as “sigmoid neurons,” meaning they are neurons that use the sigmoid activation function. Tanh is used more frequently than sigmoid.
  • Rectified Linear Unit (ReLU). Activation function that is zero for negative x values and a straight line for positive x values. ReLU is used more frequently than sigmoid and tanh because it’s more computationally effective.
  • Tensor. A connection between two neurons in sequential layers.
  • Cost Function (aka Loss or Objective Function). The function that is being minimized when training the network. This function measures the difference between the desired outcome and the outcome predicted by the network. The size of this difference (as well as the step size) informs how much the parameters at each neuron are changed with each iteration.
  • Mean Squared Error. Sum of the squared errors of each feature divided by the number of training inputs across the network.
  • Cross Entropy. A more efficient cost function than mean squared error.
  • Layers. Stages of computation of the network (input, hidden, or output)
  • Input Layer. The first layer of a network that contains all input information. Each neuron should represent an input feature. The input layer does not have a bias.
  • Hidden layer. This is a layer that sits between the input and the output layers. It can have any number of neurons.
  • Output Layer. This is the last layer in a neural network. It uses some activation function (e.g. softmax) to produce the model’s output. Number of outputs desired/required in classification problem determines number of neurons in this layer.
  • Dense Layer (aka Fully Connected Layer). A layer in a neural network whose neurons connect to each of the neurons in the subsequent layer of the neural network.
  • Gradient Descent. Methodology for figuring out how to minimize the cost function by changing weight and bias terms throughout the network
  • Learning Rate. The speed at which the model changes weights and bias terms with each iteration. By increasing the learning rate, one increases the speed at which a model will learn but also increase the risk that the global minimum will not be found (i.e., the risk that you oscillate on either side of the global minimum because the step size is too large)
  • Stochastic Gradient Descent. Gradient descent that chooses a sample (or “batch”) of neurons during each iteration in order to speed up learning.
  • Batch. Size of the training set that is used in each iteration. A random group of batches are picked during each iteration.
  • Weight. Each neuron has weights that multiply each input (i.e. w1x1 + w2x2 + b) which goes into the activation function.
  • Bias. Constant added to each input that is used for a neuron’s activation function
  • Initialization. The initial weights and biases that are used to calculate the outputs of each neuron in the network.
  • Softmax. Softmax is typically used as the output layer activation function for classification. It is a proxy for probability, the output should be a proportion that approximates the probability of being a certain class, and all of the outputs should sum to 1.

This list is far from comprehensive! See resources below for further information.



Alex Sambvani

Co-founder and CEO @ On a mission to improve phone-based customer service.