Tunable parameters in large language models (LLMs) are the learned weights that the model adjusts during training to minimize error and improve performance on language tasks.
- Also called trainable parameters.
- Include weights and biases in layers such as:
- Attention layers (e.g., query, key, value, and output projection matrices)
- Feedforward (MLP) layers
- Embedding layers
Typically number in the billions in modern LLMs (e.g., GPT-3 has 175 billion).
Tunable parameters store the knowledge the model gains from training data. During training:
The model makes a prediction.
A loss function measures the error.
Backpropagation adjusts the tunable parameters to reduce that error.
