5 Common LLM Parameters Explained with Examples
Large language fashions (LLMs) supply a number of parameters that allow you to fine-tune their habits and management how they generate responses. If a mannequin isn’t producing the specified output, the problem usually lies in how these parameters are configured. In this tutorial, we’ll discover among the mostly used ones — max_completion_tokens, temperature, top_p, presence_penalty, and frequency_penalty — and perceive how every influences the mannequin’s output.
Installing the dependencies
pip set up openai pandas matplotlib
Loading OpenAI API Key
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
Initializing the Model
from openai import OpenAI
mannequin="gpt-4.1"
consumer = OpenAI()
Max Tokens
Max Tokens is the utmost variety of tokens the mannequin can generate throughout a run. The mannequin will attempt to keep inside this restrict throughout all turns. If it exceeds the desired quantity, the run will cease and be marked as incomplete.
A smaller worth (like 16) limits the mannequin to very brief solutions, whereas the next worth (like 80) permits it to generate extra detailed and full responses. Increasing this parameter provides the mannequin extra room to elaborate, clarify, or format its output extra naturally.
immediate = "What is the most well-liked French cheese?"
for tokens in [16, 30, 80]:
print(f"n--- max_output_tokens = {tokens} ---")
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
max_completion_tokens=tokens
)
print(response.selections[0].message.content material)
Temperature
In Large Language Models (LLMs), the temperature parameter controls the variety and randomness of generated outputs. Lower temperature values make the mannequin extra deterministic and centered on probably the most possible responses — perfect for duties that require accuracy and consistency. Higher values, alternatively, introduce creativity and selection by permitting the mannequin to discover much less seemingly choices. Technically, temperature scales the chances of predicted tokens within the softmax perform: rising it flattens the distribution (extra various outputs), whereas lowering it sharpens the distribution (extra predictable outputs).
In this code, we’re prompting the LLM to provide 10 totally different responses (n_choices = 10) for a similar query — “What is one intriguing place price visiting?” — throughout a variety of temperature values. By doing this, we will observe how the variety of solutions adjustments with temperature. Lower temperatures will seemingly produce comparable or repeated responses, whereas larger temperatures will present a broader and extra various distribution of locations.
immediate = "What is one intriguing place price visiting? Give a single-word reply and suppose globally."
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10
outcomes = {}
for temp in temperatures:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temp,
n=n_choices
)
# Collect all n responses in a listing
outcomes[temp] = [response.choices[i].message.content material.strip() for i in vary(n_choices)]
# Display outcomes
for temp, responses in outcomes.gadgets():
print(f"n--- temperature = {temp} ---")
print(responses)

As we will see, because the temperature will increase to 0.6, the responses develop into extra various, transferring past the repeated single reply “Petra.” At the next temperature of 1.5, the distribution shifts, and we will see responses like Kyoto, and Machu Picchu as effectively.
Top P
Top P (often known as nucleus sampling) is a parameter that controls what number of tokens the mannequin considers based mostly on a cumulative chance threshold. It helps the mannequin give attention to the almost certainly tokens, usually bettering coherence and output high quality.
In the next visualization, we first set a temperature worth after which apply Top P = 0.5 (50%), which means solely the highest 50% of the chance mass is stored. Note that when temperature = 0, the output is deterministic, so Top P has no impact.
The era course of works as follows:
- Apply the temperature to regulate the token possibilities.
- Use Top P to retain solely probably the most possible tokens that collectively make up 50% of the entire chance mass.
- Renormalize the remaining possibilities earlier than sampling.
We’ll visualize how the token chance distribution adjustments throughout totally different temperature values for the query:
“What is one intriguing place price visiting?”
immediate = "What is one intriguing place price visiting? Give a single-word reply and suppose globally."
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10
results_ = {}
for temp in temperatures:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temp,
n=n_choices,
top_p=0.5
)
# Collect all n responses in a listing
results_[temp] = [response.choices[i].message.content material.strip() for i in vary(n_choices)]
# Display outcomes
for temp, responses in results_.gadgets():
print(f"n--- temperature = {temp} ---")
print(responses)

Since Petra persistently accounted for greater than 50% of the entire response chance, making use of Top P = 0.5 filters out all different choices. As a end result, the mannequin solely selects “Petra” as the ultimate output in each case.
Frequency Penalty
Frequency Penalty controls how a lot the mannequin avoids repeating the identical phrases or phrases in its output.
Range: -2 to 2
Default: 0
When the frequency penalty is larger, the mannequin will get penalized for utilizing phrases it has already used earlier than. This encourages it to decide on new and totally different phrases, making the textual content extra various and fewer repetitive.
In easy phrases — the next frequency penalty = much less repetition and extra creativity.
We’ll take a look at this utilizing the immediate:
“List 10 doable titles for a fantasy e book. Give the titles solely and every title on a brand new line.”
immediate = "List 10 doable titles for a fantasy e book. Give the titles solely and every title on a brand new line."
frequency_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
outcomes = {}
for fp in frequency_penalties:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
frequency_penalty=fp,
temperature=0.2
)
textual content = response.selections[0].message.content material
gadgets = [line.strip("- ").strip() for line in text.split("n") if line.strip()]
outcomes[fp] = gadgets
# Display outcomes
for fp, gadgets in outcomes.gadgets():
print(f"n--- frequency_penalty = {fp} ---")
print(gadgets)

- Low frequency penalties (-2 to 0): Titles are likely to repeat, with acquainted patterns like “The Shadow Weaver’s Oath”, “Crown of Ember and Ice”, and “The Last Dragon’s Heir” showing ceaselessly.
- Moderate penalties (0.5 to 1.5): Some repetition stays, however the mannequin begins producing extra various and artistic titles.
- High penalty (2.0): The first three titles are nonetheless the identical, however after that, the mannequin produces various, distinctive, and imaginative e book names (e.g., “Whisperwind Chronicles: Rise of the Phoenix Queen”, “Ashes Beneath the Willow Tree”).
Presence Penalty
Presence Penalty controls how a lot the mannequin avoids repeating phrases or phrases which have already appeared within the textual content.
- Range: -2 to 2
- Default: 0
A better presence penalty encourages the mannequin to make use of a greater diversity of phrases, making the output extra various and artistic.
Unlike the frequency penalty, which accumulates with every repetition, the presence penalty is utilized as soon as to any phrase that has already appeared, decreasing the prospect it is going to be repeated within the output. This helps the mannequin produce textual content with extra selection and originality.
immediate = "List 10 doable titles for a fantasy e book. Give the titles solely and every title on a brand new line."
presence_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
outcomes = {}
for fp in frequency_penalties:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
presence_penalty=fp,
temperature=0.2
)
textual content = response.selections[0].message.content material
gadgets = [line.strip("- ").strip() for line in text.split("n") if line.strip()]
outcomes[fp] = gadgets
# Display outcomes
for fp, gadgets in outcomes.gadgets():
print(f"n--- presence_penalties = {fp} ---")
print(gadgets)

- Low to Moderate Penalty (-2.0 to 0.5): Titles are considerably various, with some repetition of widespread fantasy patterns like “The Shadow Weaver’s Oath”, “The Last Dragon’s Heir”, “Crown of Ember and Ice”.
- Medium Penalty (1.0 to 1.5): The first few common titles stay, whereas later titles present extra creativity and distinctive combos. Examples: “Ashes of the Fallen Kingdom”, “Secrets of the Starbound Forest”, “Daughter of Storm and Stone”.
- Maximum Penalty (2.0): Top three titles keep the identical, however the remainder develop into extremely various and imaginative. Examples: “Moonfire and Thorn”, “Veil of Starlit Ashes”, “The Midnight Blade”.
Check out the (*5*). Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be at liberty to observe us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The put up 5 Common LLM Parameters Explained with Examples appeared first on MarkTechPost.
