Temperature

Temperature is a sampling parameter that dictates the "creativity" or "predictability" of the AI's generated text. It is part of the sampling/createMessage request in MCP.

Effects on Output

→ Low Temperature (e.g., 0.1 - 0.3): Makes the model more deterministic and focused. It tends to choose the most likely next token, leading to consistent but potentially repetitive results. Ideal for coding and data extraction.
→ High Temperature (e.g., 0.7 - 1.0): Increases randomness and diversity. The model is more likely to choose less probable tokens, resulting in more creative or conversational output.

MCP Implementation

A server can suggest a temperature to the host:

json

{
  "method": "sampling/createMessage",
  "params": {
    "messages": [...],
    "temperature": 0.2
  }
}

The host application is responsible for passing this value to the underlying LLM engine.

Questions & Answers

What does "Temperature" control in an AI model's output?

Temperature is a parameter that controls the randomness of text generation. A low temperature makes the model more deterministic and focused, while a high temperature increases creativity and diversity in the output.

In which scenarios should a low temperature (0.1 - 0.3) be used?

Low temperature is ideal for tasks requiring high precision and consistency, such as coding, data extraction, or answering technical questions where factual accuracy is more important than creative flair.

How does an MCP server influence the temperature of an AI's response?

A server can suggest a specific temperature to the host application within a sampling/createMessage request. However, the host's inference engine is ultimately responsible for applying this value during the model call.