Max Tokens
An inference parameter capping the length of the model's response.
It limits output length only — not the input context size.
An inference parameter capping the length of the model's response.
It limits output length only — not the input context size.