Forward propagation refers to the sequential computation of outputs by applying:
- Linear transformations (matrix multiplication + bias)
- Non-linear activation functions
Layer by layer, the network computes:
text{output} = f(Wx + b)
where:
- x: input vector
- W: weight matrix
- b: bias
- f: activation function
In LLMs:
- Tokens are converted to embeddings.
- Embeddings are passed through transformer layers (attention + feedforward + activation).
- A final output (e.g., a probability distribution over next-token predictions) is computed.
Purpose:
- Produces the model’s prediction.
- Feeds into loss calculation during training.
- Prepares data for backpropagation, which updates weights.
Forward propagation is the inference step in a neural network—where inputs flow through the model to generate predictions. It’s the first half of the learning process, followed by backpropagation.
