Forward Propagation

    0

    Forward propagation refers to the sequential computation of outputs by applying:

    1. Linear transformations (matrix multiplication + bias)
    2. Non-linear activation functions

    Layer by layer, the network computes:

    text{output} = f(Wx + b)

    where:

    In LLMs:

    • Tokens are converted to embeddings.
    • Embeddings are passed through transformer layers (attention + feedforward + activation).
    • A final output (e.g., a probability distribution over next-token predictions) is computed.

    Purpose:

    • Produces the model’s prediction.
    • Feeds into loss calculation during training.
    • Prepares data for backpropagation, which updates weights.

    Forward propagation is the inference step in a neural network—where inputs flow through the model to generate predictions. It’s the first half of the learning process, followed by backpropagation.