Adding Memory to your Agent: Short-Term and Long-Term memory in practice
Introduction
In our previous tutorials, we built a code review agent that uses tools. But there’s a critical limitation:our agent has no memory between interactions. Every time we call think(), the agent starts fresh, with no knowledge of previous conversations or actions.
Imagine asking the agent to “review the last file I mentioned” or “compare this code to what you saw earlier”. Without memory it can’t do either. In this article, we’ll transform our stateless agent into one that remembers conversations, learns from interactions, and manages its memory efficiently.
We will cover:
- Why memory matters for agents
- Short-term memory
- Long-term memory
- Memory summarization techniques
- Context window management strategies
Why memory matters
Memory enables continuity. Without it agents can’t:
- Reference previous questions or answers
- Build on past interactions
- Learn user preferences
- Handle multi-turn workflows (e.g. “read the file, then analyze it, then write tests”) Real world conversations have context. Our agent needs memory to maintain that context and provide intelligent, contextual responses.
Short term memory: Conversation History
Short-term memory stores the recent conversation between user and agent. This is the foundation of a multi-turn dialogue.
Implementation: Adding a Message Buffer
Let’s add a simple conversation history to our agent CodeReveiwAgentWithTools
- Inititalize a list for conversation history
class CodeReviewAgentWithSTMemory: def __init__(self,tools_registry: ToolRegistry, model="gpt-4o-mini"): self.tools = tools_registry self.model = model self.conversation_history = [] # Short-term memory - Update
think()to add user input and LLM responses toconversation_historyand include the conversation history in the promptdef think(self, user_input:str): """LLM decides which tool to use with conversation context.""" # Add user message to history self.conversation_history.append({"role":"user","content":user_input}) # Build prompt with system instructions messages = [ { "role":"system", "content":"""You are a code assistant with access to these tools: - read_file(filepath) - analyze_code(code) Decide which tool to use based on the conversation. If a tool call is needed Reply ONLY with the tool call to make in JSON format tool (e.g., tool Examples: - read_file("main.py") - patch_file(filepath, content) - print_review(review: str) If the task is complete respond with JSON done: true, where the summary is the reason why the task is complete """ } ] + self.conversation_history response = openai.responses.create(model=self.model, input=messages) decision = response.output_text # Add assistant's decision to conversation history self.conversation_history.append({ "role":"assistant", "content": decision }) return decision - Update
act()to add tool call results to conversation historydef act(self, decision:str): """Execute the chosen tool and record the result.""" try: parsed = json.loads(decision) tool_name = parsed["tool"] args = parsed.get("args",[]) result = self.tools.call(tool_name,*args) #Store tool call result in conversation history self.conversation_history.append({ "role":"system", "content":f"Tool result: {result}" }) return result except Exception as e: error_msg = f"Error executing tool: {e}" self.conversation_history.append({ "role":"system", "content": error_msg }) return error_msgWhat changed
conversation_historylist: Stores all messages as dictionaries withroleandcontent- Messages passed to LLM: Instead of a single prompt string, we send the entire conversation
- Tool call result stored: After each action we append the result to history so the agent can reference it
Tool Setup
import openai
import os
from typing import Callable, Dict
import json
def read_file(filepath: str) -> str:
"""Read contents of a Python file"""
if not os.path.exists(filepath):
return f"File not found: {filepath}"
with open(filepath, "r") as f:
return f.read()
def patch_file(filepath: str, content: str) -> str:
"""Writes the given content to a file, completely replacing its current content."""
try:
with open(filepath, "w") as f:
f.write(content)
return f"File successfully updated: {filepath}. New content written."
except Exception as e:
return f"Error writing to file {filepath}: {e}"
def print_review(review: str):
print(f"Review: {review}")
return f"Printed review: {review}"
class ToolRegistry:
"""Holds available tools and dispatches them by name."""
def __init__(self):
self.tools: Dict[str,Callable] = {}
def register(self, name:str, func: Callable):
self.tools[name] = func
def call(self, name:str, *args, **kwargs):
if name not in self.tools:
return f"Unknown tool: {name}"
return self.tools[name](*args, **kwargs)
class ToolRegistry:
"""Holds available tools and dispatches them by name."""
def __init__(self):
self.tools: Dict[str,Callable] = {}
def register(self, name:str, func: Callable):
self.tools[name] = func
def call(self, name:str, *args, **kwargs):
if name not in self.tools:
return f"Unknown tool: {name}"
return self.tools[name](*args, **kwargs)
import json
class CodeReviewAgentWithSTMemory:
def __init__(self,tools_registry: ToolRegistry, model="gpt-4o-mini"):
self.tools = tools_registry
self.model = model
self.conversation_history = [] # Short-term memory
def think(self, user_input:str):
"""LLM decides which tool to use with conversation context."""
# Add user message to history
self.conversation_history.append({"role":"user","content":user_input})
# Build prompt with system instructions
messages = [
{
"role":"system",
"content":f"""You are a code assistant with access to the tools below.
Available tools:
- read_file(filepath)
- patch_file(filepath, content)
- print_review(review: str)
Decide which tool is most appropriate based on the conversation.
If a tool call is needed Reply ONLY with the tool call to make in JSON format tool (e.g., tool
Examples:
- read_file("main.py")
- patch_file(filepath, content)
- print_review(review: str)
If the task is complete respond with JSON done: true, where the summary is the reason why the task is complete
"""
}
] + self.conversation_history
response = openai.responses.create(model=self.model, input=messages)
decision = response.output_text
# Add assistant's decision to conversation history
self.conversation_history.append({
"role":"assistant",
"content": decision
})
return decision
def act(self, decision:str):
"""Execute the chosen tool and record the result."""
print(f"Acting on decision: {decision}")
try:
parsed = json.loads(decision)
tool_name = parsed["tool"]
args = parsed.get("args",[])
result = self.tools.call(tool_name,*args)
#Store tool call result in conversation history
self.conversation_history.append({
"role":"system",
"content":f"Tool result: {result}"
})
return result
except Exception as e:
error_msg = f"Error executing tool: {e}"
self.conversation_history.append({
"role":"system",
"content": error_msg
})
return error_msg
def run(self, user_input: str, max_steps:int=3):
original_input = user_input
for step in range(max_steps):
print(f"Step: {step+1} of {max_steps}")
decision = self.think(user_input)
try:
decision_parsed = json.loads(decision)
except json.JSONDecodeError as e:
print(f"Could not parse decision:{decision}. Error: {e}")
user_input = f"Your response was not valid JSON.\nOriginal user request: {original_input}"
if decision_parsed.get("done"):
print(f"Task complete\nAssistant Repose:{decision}")
return decision_parsed.get("summary")
result = self.act(decision)
user_input = f"Original user request: {original_input}\nLast assistant response{decision}\nLast tool result: {result}. continue with original user request"
print("Loop complete. (max steps reached)")
return result
Let’s give it a try
registry = ToolRegistry()
registry.register("read_file",read_file)
registry.register("patch_file",patch_file)
registry.register("print_review",print_review)
agent_with_st_memory = CodeReviewAgentWithSTMemory(registry)
# Multi-turn conversation
result = agent_with_st_memory.run("Review the code in sample.py and print the review",max_steps=5)
print(f"Agent Result: {result}")
print(f"Agent Chat History : {agent_with_st_memory.conversation_history}")
*Key insight: The LLM sees the full conversation each time, allowing it to understand context and references like “that code” or “the last file”
Long-term Memory: Persistent Knowledge
Short-term memory is exists only during a session. Long term memory persists across sessions and stores important information the agent should remember every time it is performing a task.
Use cases for long term memory
- User preferences: “I prefer tests with pytest, not unittest”
- Project context: “this is a fastapi web api with sqlalchemy models”
- Learned patterns: “user often asks for sql injection vulnerabilities”
- Important facts: File paths, project structure, common issues
Implementation: Adding a long term knowledge store
Let’s add long term memory to our agent
- Add
long_term_memoryandmemory_filewhich we will implement as a simple key value store persisted in a.jsonfileclass CodeReviewAgentWithLTMemory: def __init__(self,tools_registry: ToolRegistry, model="gpt-4o-mini",memory_file="agent_memory.json"): #...rest of init .. - Add
remember()adds/updates key value in the long term memorydef remember(self, key:str, value: str): """Retrieve information from long term memory.""" self.long_term_memory[key] = value self.save_long_term_memory() - Add
recall()retrieves a particular item from the long term memorydef recall(self,key:str) -> str: """Retrieve information from long term memory""" return self.long_term_memory.get(key,"No memory found for this key.") - Add
get_relevant_memories()gets and formats the long term memories to include in the system messagedef get_relevant_memories(self) -> str: """Format long term memories for inclusion in prompts.""" if not self.long_term_memory: return "No stored memories" memories = "\n".join([f"- {k}:{v}" for k, v in self.long_term_memory.items()]) return f"Relevant memories:\n{memories}" - Add
save_long_term_memory()to persist long term memory to disk. This makes sure it persists between agent sessionsdef save_long_term_memory(self): """Persist long term memory to JSON file""" try: with open(self.memory_file,"w") as f: json.dump(self.long_term_memory,f,indent=2) except Exception as e: print(f"Warning: Could not save memory to {self.memory_file}: {e}") - Add
load_long_term_memory()load long term memory when the agent initializes ```python def load_long_term_memory(self): “"”Load long term memory from JSON file””” if os.path.exists(self.memory_file): try: with open(self.memory_file, ‘r’) as f: self.long_term_memory = json.load(f) print(f”Loaded {len(self.long_term_memory)} memories from {self.memory_file}”) except Exception as e: print(f”Warning: Could not load memory from {self.memory_file}: {e}”) else: self.long_term_memory = {}
7. Update the prompt's system message to include the long term memory as `relevant_memories`
```python
def think(self, user_input:str):
"""LLM decides which tool to use with both short term and long term context."""
#...existing code...
#Include long term memory in system context
system_message_context = f"""You are a code assistant with access to these tools:
- read_file(filepath)
- analyze_code(code)
{self.get_relevant_memories()}
Decide which tool to use based on the conversation and relevant memories.
If a tool call is needed Reply ONLY with the tool call to make in JSON format tool (e.g., tool
Examples:
- read_file("main.py")
- patch_file(filepath, content)
- print_review(review: str)
If the task is complete respond with JSON done: true, where the summary is the reason why the task is complete
"""
#... existing code
import json
import os
from typing import Dict, Callable
class CodeReviewAgentWithLTMemory:
def __init__(self,tools_registry: ToolRegistry, model="gpt-4o-mini",memory_file="agent_memory.json"):
self.tools = tools_registry
self.model = model
self.conversation_history = [] # Short-term memory
self.memory_file = memory_file
self.load_long_term_memory() # Long-term memory (key-value store)
def remember(self, key:str, value: str):
"""Save information to long term memory."""
self.long_term_memory[key] = value
self.save_long_term_memory()
def recall(self,key:str) -> str:
"""Retrieve information from long term memory"""
return self.long_term_memory.get(key,"No memory found for this key.")
def get_relevant_memories(self) -> str:
"""Format long term memories for inclusion in prompts."""
if not self.long_term_memory:
return "No stored memories"
memories = "\n".join([f"- {k}:{v}" for k, v in self.long_term_memory.items()])
return f"Relevant memories:\n{memories}"
def save_long_term_memory(self):
"""Persist long term memory to JSON file"""
try:
with open(self.memory_file,"w") as f:
json.dump(self.long_term_memory,f,indent=2)
except Exception as e:
print(f"Warning: Could not save memory to {self.memory_file}: {e}")
def load_long_term_memory(self):
"""Load long term memory from JSON file"""
if os.path.exists(self.memory_file):
try:
with open(self.memory_file, 'r') as f:
self.long_term_memory = json.load(f)
print(f"Loaded {len(self.long_term_memory)} memories from {self.memory_file}")
except Exception as e:
print(f"Warning: Could not load memory from {self.memory_file}: {e}")
else:
self.long_term_memory = {}
def think(self, user_input:str):
"""LLM decides which tool to use with both short term and long term context."""
# Add user message to history
self.conversation_history.append({"role":"user","content":user_input})
#Include long term memory in system context
system_message_context = f"""You are a code assistant with access to the tools below.
Available tools:
- read_file(filepath)
- patch_file(filepath, content)
- print_review(review: str)
{self.get_relevant_memories()}
If a tool call is needed Reply ONLY with the tool call to make in JSON format tool (e.g., tool
Examples:
- read_file("main.py")
- patch_file(filepath, content)
- print_review(review: str)
If the task is complete respond with JSON done: true, where the summary is the reason why the task is complete
"""
# Build prompt with system instructions
messages = [
{
"role":"system",
"content":system_message_context
}
] + self.conversation_history
response = openai.responses.create(model=self.model, input=messages)
decision = response.output_text
# Add assistant's decision to conversation history
self.conversation_history.append({
"role":"assistant",
"content": decision
})
return decision
def act(self, decision:str):
"""Execute the chosen tool and record the result."""
try:
if "(" in decision and ")" in decision:
name, arg = decision.split("(",1)
arg = arg.strip(")'\"")
result = self.tools.call(name.strip(),arg)
else:
result = self.tools.call(decision)
#Store tool call result in conversation history
self.conversation_history.append({
"role":"system",
"content":f"Tool result: {result}"
})
return result
except Exception as e:
error_msg = f"Error executing tool: {e}"
self.conversation_history.append({
"role":"system",
"content": error_msg
})
return error_msg
def run(self, user_input: str, max_steps:int=3):
original_input = user_input
for step in range(max_steps):
print(f"Step: {step+1} of {max_steps}")
decision = self.think(user_input)
try:
decision_parsed = json.loads(decision)
except json.JSONDecodeError as e:
print(f"Could not parse decision:{decision}. Error: {e}")
user_input = f"Your response was not valid JSON.\nOriginal user request: {original_input}"
if decision_parsed.get("done"):
print(f"Task complete\nAssistant Repose:{decision}")
return decision_parsed.get("summary")
result = self.act(decision)
user_input = f"Original user request: {original_input}\nLast assistant response{decision}\nLast tool result: {result}. continue with original user request"
print("Loop complete. (max steps reached)")
return result
Demo: Long term memory persists across agent sessions
Key insight: Long term memory provides persistent context that informs every interaction, enabling the agent to personalize its behaviour and remember important facts across sessions
# Multi-turn conversation with long term memory
registry = ToolRegistry()
registry.register("read_file",read_file)
registry.register("print_review",print_review)
registry.register("patch_file",patch_file)
agent_with_lt_memory1 = CodeReviewAgentWithLTMemory(registry)
code_snippet = """
def divide(a,b):
return a/b
"""
agent_with_lt_memory1.remember("documentation","add comprehensive documentation and doc string to ALL code generated")
decision1_with_ltm1 = agent_with_lt_memory1.run(f"Review this code:{code_snippet}")
print(f"First Agent Long Term Memory {agent_with_lt_memory1.long_term_memory}")
print((f"First Agent Conversion History: {agent_with_lt_memory1.conversation_history}"))
# New session has long term memory
agent_with_lt_memory2 = CodeReviewAgentWithLTMemory(registry)
print(f"Second Agent Long Term Memory {agent_with_lt_memory2.long_term_memory}")
print((f"Second Agent Conversion History: {agent_with_lt_memory2.conversation_history}"))
Memory summarization: Keeping Context Compact
As conversations grow so does the memory footprint. A 50 turn conversation might contain thousands of tokens. Summarization compresses old conversation turns into consise summaries, preserving essential information while reducing token usgae.
When to summarize
- After a number of conversation turns
- When conversation history exceeds a token threshold
- When moving to a new topic or task
Let’s implement a simple periodic summarization where we use an LLM to generate a summary from the conversation history and trim the conversation to the last few turns.
- Add
summarize_afterparameter to agent initialization to set after how many messages to summarizeclass CodeReviewAgentWithSTMemorySummarization: def __init__(self,tools_registry: ToolRegistry, model="gpt-4o-mini",memory_file="agent_memory.json",summarize_after=10): # ...exisiting init code... self.conversation_summary = "" # Summarized conversation history self.summarize_after = summarize_after # Number of conversation turns after which to summarize self.turns_since_summary = 0 # Track number of turns sinse last summary - Add
conversation_summaryto keep the conversation summary -
Add
summarize_history(): Periodically use LLM to summarize conversation history when thesummarize_aftermessage limit is reached ```python def summarize_history(self): “"”Use LLM to summarize the conversation so far.””” if len(self.conversation_history) < 3: returnhistory_text = "\n".join([f"{msg["role"]}:{msg["content"]}" for msg in self.conversation_history]) summary_prompt = f"""Summarize this conversation in 3-4 sentences, preserving key fact, decisions, and actions taken: {history_text} Previous Summary: {self.conversation_summary or 'None'} """ response = openai.responses.create(model=self.model, input=[{"role":"user","content":summary_prompt}]) self.conversation_summary = response.output_text # Keep only the last few turns + the summary recent_turns = self.conversation_history[-4:] # Keep the last 4 messages (2 user/assistant exchanges) self.conversation_history = recent_turns self.turns_since_summary = 0
4. Include the `conversation_summary` in the system prompt
```python
def think(self, user_input:str):
"""LLM decides which tool to use with both short term and long term context."""
# Add user message to history
self.conversation_history.append({"role":"user","content":user_input})
self.turns_since_summary += 1
# Check if we should summarize
if self.turns_since_summary >= self.summarize_after:
self.summarize_history()
#Include long term memory & summary in system context
system_message_context = f"""You are a code assistant with access to these tools:
- read_file(filepath)
- analyze_code(code)
{self.get_relevant_memories()}
Conversation Summary: {self.conversation_summary or 'This is the start of the conversation'}
Decide which tool to use based on the conversation, conversation summary and relevant memories.
If a tool call is needed Reply ONLY with the tool call to make in JSON format tool (e.g., tool
Examples:
- read_file("main.py")
- patch_file(filepath, content)
- print_review(review: str)
If the task is complete respond with JSON done: true, where the summary is the reason why the task is complete
"""
# ...rest of think code...
class CodeReviewAgentWithSTMemorySummarization:
def __init__(self,tools_registry: ToolRegistry, model="gpt-4o-mini",memory_file="agent_memory.json",summarize_after=10):
self.tools = tools_registry
self.model = model
self.conversation_history = [] # Short-term memory
self.memory_file = memory_file
self.load_long_term_memory() # Long-term memory (key-value store)
self.conversation_summary = "" # Summarized conversation history
self.summarize_after = summarize_after # Number of conversation turns after which to summarize
self.turns_since_summary = 0 # Track number of turns sinse last summary
def summarize_history(self):
"""Use LLM to summarize the conversation so far."""
if len(self.conversation_history) < 3:
return
history_text = "\n".join([f"{msg["role"]}:{msg["content"]}" for msg in self.conversation_history])
summary_prompt = f"""Summarize this conversation in 3-4 sentences,
preserving key fact, decisions, and actions taken:
{history_text}
Previous Summary: {self.conversation_summary or 'None'}
"""
response = openai.responses.create(model=self.model, input=[{"role":"user","content":summary_prompt}])
self.conversation_summary = response.output_text
# Keep only the last few turns + the summary
recent_turns = self.conversation_history[-4:] # Keep the last 4 messages (2 user/assistant exchanges)
self.conversation_history = recent_turns
self.turns_since_summary = 0
def remember(self, key:str, value: str):
"""Retrieve information from long term memory."""
self.long_term_memory[key] = value
self.save_long_term_memory()
def recall(self,key:str) -> str:
"""Retrieve information from long term memory"""
return self.long_term_memory.get(key,"No memory found for this key.")
def get_relevant_memories(self) -> str:
"""Format long term memories for inclusion in prompts."""
if not self.long_term_memory:
return "No stored memories"
memories = "\n".join([f"- {k}:{v}" for k, v in self.long_term_memory.items()])
return f"Relevant memories:\n{memories}"
def save_long_term_memory(self):
"""Persist long term memory to JSON file"""
try:
with open(self.memory_file,"w") as f:
json.dump(self.long_term_memory,f,indent=2)
except Exception as e:
print(f"Warning: Could not save memory to {self.memory_file}: {e}")
def load_long_term_memory(self):
"""Load long term memory from JSON file"""
if os.path.exists(self.memory_file):
try:
with open(self.memory_file, 'r') as f:
self.long_term_memory = json.load(f)
print(f"Loaded {len(self.long_term_memory)} memories from {self.memory_file}")
except Exception as e:
print(f"Warning: Could not load memory from {self.memory_file}: {e}")
else:
self.long_term_memory = {}
def think(self, user_input:str):
"""LLM decides which tool to use with both short term and long term context."""
# Add user message to history
self.conversation_history.append({"role":"user","content":user_input})
self.turns_since_summary += 1
# Check if we should summarize
if self.turns_since_summary >= self.summarize_after:
self.summarize_history()
#Include long term memory & summary in system context
system_message_context = f"""You are a code assistant with access to the tools below.
Available tools:
- read_file(filepath)
- patch_file(filepath, content)
- print_review(review: str)
{self.get_relevant_memories()}
Conversation Summary: {self.conversation_summary or 'This is the start of the conversation'}
Decide which tool to use based on the conversation, conversation summary and relevant memories.
If a tool call is needed Reply ONLY with the tool call to make in JSON format tool (e.g., tool
Examples:
- read_file("main.py")
- patch_file(filepath, content)
- print_review(review: str)
If the task is complete respond with JSON done: true, where the summary is the reason why the task is complete
"""
# Build prompt with system instructions
messages = [
{
"role":"system",
"content":system_message_context
}
] + self.conversation_history
response = openai.responses.create(model=self.model, input=messages)
decision = response.output_text
# Add assistant's decision to conversation history
self.conversation_history.append({
"role":"assistant",
"content": decision
})
return decision
def act(self, decision:str):
"""Execute the chosen tool and record the result."""
try:
if "(" in decision and ")" in decision:
name, arg = decision.split("(",1)
arg = arg.strip(")'\"")
result = self.tools.call(name.strip(),arg)
else:
result = self.tools.call(decision)
#Store tool call result in conversation history
self.conversation_history.append({
"role":"system",
"content":f"Tool result: {result}"
})
return result
except Exception as e:
error_msg = f"Error executing tool: {e}"
self.conversation_history.append({
"role":"system",
"content": error_msg
})
return error_msg
def run(self, user_input: str, max_steps:int=3):
original_input = user_input
for step in range(max_steps):
print(f"Step: {step+1} of {max_steps}")
decision = self.think(user_input)
try:
decision_parsed = json.loads(decision)
except json.JSONDecodeError as e:
print(f"Could not parse decision:{decision}. Error: {e}")
user_input = f"Your response was not valid JSON.\nOriginal user request: {original_input}"
if decision_parsed.get("done"):
print(f"Task complete\nAssistant Repose:{decision}")
return decision_parsed.get("summary")
result = self.act(decision)
user_input = f"Original user request: {original_input}\nLast assistant response{decision}\nLast tool result: {result}. continue with original user request"
print("Loop complete. (max steps reached)")
return result
Context Window Management
Every LLM has a context window - a maximum number of tokens it can process at once.
When conversation history + long term memory + prompt + response exceed this limit, the LLM call my fail or return an incomplete response.
For this reason we need to manage the context window limits.
Strategies for Managing the Context Window
- Token Counting: Estimate or count tokens before sending to the LLM
- Trimming: Remove the oldest messages beyond a threshold
- Selctive forgetting: Drop less important messages
- Hierarchical Summarization Sumarize summaries for very long interactions
Implement Token Aware Trimming
Below is a simple implementation of token aware trimming
- Token counting. We use
tiktokento accurately count tokensimport tiktoken # OpenAI token counting library - Add
trim_history_to_fit(): Removes the oldest messages when over budget. This is called every time the agent callsthink()def trim_history_to_fit(self, system_message:str): """Remove old messages until we fit within the token budget""" # Count tokens in system message fixed_tokens = self.count_tokens(system_message) # Count tokens in conversation history history_tokens = sum([self.count_tokens(msg["content"]) for msg in self.conversation_history]) total_tokens = fixed_tokens + history_tokens while total_tokens > self.max_context_tokens and len(self.conversation_history) > 2: removed_msg = self.conversation_history.pop(0) total_tokens -= self.count_tokens(removed_msg["content"]) return total_tokens - Update
think()to trim historydef think(self, user_input:str): """LLM decides which tool to use with both short term and long term context.""" # Add user message to history self.conversation_history.append({"role":"user","content":user_input}) self.turns_since_summary += 1 # Check if we should summarize if self.turns_since_summary >= self.summarize_after: self.summarize_history() #Include long term memory & summary in system context system_message_context = f"""You are a code assistant with access to the tools below. Available tools: - read_file(filepath) - patch_file(filepath, content) - print_review(review: str) {self.get_relevant_memories()} Conversation Summary: {self.conversation_summary or 'This is the start of the conversation'} Decide which tool to use based on the conversation, conversation summary and relevant memories. If a tool call is needed Reply ONLY with the tool call to make in JSON format tool (e.g., tool Examples: - read_file("main.py") - patch_file(filepath, content) - print_review(review: str) If the task is complete respond with JSON done: true, where the summary is the reason why the task is complete """ self.trim_history_to_fit(system_message_context) #...existing think code... - Add
max_context_tokensto configure token limitsclass CodeReviewAgentWithTrimming: def __init__(self,tools_registry: ToolRegistry, model="gpt-4o-mini",memory_file="agent_memory.json",summarize_after=10,max_context_tokens=6000): # ...existing init code... self.max_context_tokens = max_context_tokens
Code review agent with memory and context management
import tiktoken # OpenAI token counting library
class CodeReviewAgentWithContext:
def __init__(self,tools_registry: ToolRegistry, model="gpt-4o-mini",memory_file="agent_memory.json",summarize_after=10,max_context_tokens=6000):
self.tools = tools_registry
self.model = model
self.conversation_history = [] # Short-term memory
self.memory_file = memory_file
self.load_long_term_memory() # Long-term memory (key-value store)
self.conversation_summary = "" # Summarized conversation history
self.summarize_after = summarize_after
self.turns_since_summary = 0
self.max_context_tokens = max_context_tokens
# Initialize tokenizer for the model
try:
self.tokenizer = tiktoken.encoding_for_model(model)
except:
self.tokenizer = tiktoken.get_encoding("cl100k_base")
def count_tokens(self, text:str) -> int:
"""Count tokens in a string"""
return len(self.tokenizer.encode(text))
def trim_history_to_fit(self, system_message:str):
"""Remove old messages until we fit within the token budget"""
# Count tokens in system message
fixed_tokens = self.count_tokens(system_message)
# Count tokens in conversation history
history_tokens = sum([self.count_tokens(msg["content"]) for msg in self.conversation_history])
total_tokens = fixed_tokens + history_tokens
while total_tokens > self.max_context_tokens and len(self.conversation_history) > 2:
removed_msg = self.conversation_history.pop(0)
total_tokens -= self.count_tokens(removed_msg["content"])
return total_tokens
def summarize_history(self):
"""Use LLM to summarize the conversation so far."""
if len(self.conversation_history) < 3:
return
history_text = "\n".join([f"{msg["role"]}:{msg["content"]}" for msg in self.conversation_history])
summary_prompt = f"""Summarize this conversation in 3-4 sentences,
preserving key fact, decisions, and actions taken:
{history_text}
Previous Summary: {self.conversation_summary or 'None'}
"""
response = openai.responses.create(model=self.model, input=[{"role":"user","content":summary_prompt}])
self.conversation_summary = response.output_text
# Keep only the last few turns + the summary
recent_turns = self.conversation_history[-4:] # Keep the last 4 messages (2 user/assistant exchanges)
self.conversation_history = recent_turns
self.turns_since_summary = 0
def remember(self, key:str, value: str):
"""Retrieve information from long term memory."""
self.long_term_memory[key] = value
self.save_long_term_memory()
def recall(self,key:str) -> str:
"""Retrieve information from long term memory"""
return self.long_term_memory.get(key,"No memory found for this key.")
def get_relevant_memories(self) -> str:
"""Format long term memories for inclusion in prompts."""
if not self.long_term_memory:
return "No stored memories"
memories = "\n".join([f"- {k}:{v}" for k, v in self.long_term_memory.items()])
return f"Relevant memories:\n{memories}"
def save_long_term_memory(self):
"""Persist long term memory to JSON file"""
try:
with open(self.memory_file,"w") as f:
json.dump(self.long_term_memory,f,indent=2)
except Exception as e:
print(f"Warning: Could not save memory to {self.memory_file}: {e}")
def load_long_term_memory(self):
"""Load long term memory from JSON file"""
if os.path.exists(self.memory_file):
try:
with open(self.memory_file, 'r') as f:
self.long_term_memory = json.load(f)
print(f"Loaded {len(self.long_term_memory)} memories from {self.memory_file}")
except Exception as e:
print(f"Warning: Could not load memory from {self.memory_file}: {e}")
else:
self.long_term_memory = {}
def think(self, user_input:str):
"""LLM decides which tool to use with both short term and long term context."""
# Add user message to history
self.conversation_history.append({"role":"user","content":user_input})
self.turns_since_summary += 1
# Check if we should summarize
if self.turns_since_summary >= self.summarize_after:
self.summarize_history()
#Include long term memory & summary in system context
system_message_context = f"""You are a code assistant with access to the tools below.
Available tools:
- read_file(filepath)
- patch_file(filepath, content)
- print_review(review: str)
{self.get_relevant_memories()}
Conversation Summary: {self.conversation_summary or 'This is the start of the conversation'}
Decide which tool to use based on the conversation, conversation summary and relevant memories.
If a tool call is needed Reply ONLY with the tool call to make in JSON format tool (e.g., tool
Examples:
- read_file("main.py")
- patch_file(filepath, content)
- print_review(review: str)
If the task is complete respond with JSON done: true, where the summary is the reason why the task is complete
"""
self.trim_history_to_fit(system_message_context)
# Build prompt with system instructions
messages = [
{
"role":"system",
"content":system_message_context
}
] + self.conversation_history
response = openai.responses.create(model=self.model, input=messages)
decision = response.output_text
# Add assistant's decision to conversation history
self.conversation_history.append({
"role":"assistant",
"content": decision
})
return decision
def act(self, decision:str):
"""Execute the chosen tool and record the result."""
try:
if "(" in decision and ")" in decision:
name, arg = decision.split("(",1)
arg = arg.strip(")'\"")
result = self.tools.call(name.strip(),arg)
else:
result = self.tools.call(decision)
#Store tool call result in conversation history
self.conversation_history.append({
"role":"system",
"content":f"Tool result: {result}"
})
return result
except Exception as e:
error_msg = f"Error executing tool: {e}"
self.conversation_history.append({
"role":"system",
"content": error_msg
})
return error_msg
def run(self, user_input: str, max_steps:int=3):
original_input = user_input
for step in range(max_steps):
print(f"Step: {step+1} of {max_steps}")
decision = self.think(user_input)
try:
decision_parsed = json.loads(decision)
except json.JSONDecodeError as e:
print(f"Could not parse decision:{decision}. Error: {e}")
user_input = f"Your response was not valid JSON.\nOriginal user request: {original_input}"
if decision_parsed.get("done"):
print(f"Task complete\nAssistant Repose:{decision}")
return decision_parsed.get("summary")
result = self.act(decision)
user_input = f"Original user request: {original_input}\nLast assistant response{decision}\nLast tool result: {result}. continue with original user request"
print("Loop complete. (max steps reached)")
return result
Notes on Memory
- We have shown storing long term memory and retrieving all of it. In practice, with large memory sizes, it may be more efficient to store in a e.g. a vector store or database and use retrieval based on user input to fetch long term memory that is relevant to the agent task.
- In our example we showed conversation history as lasting only for the session. It may be useful for later reference to also persist chat history. This stored conversation history would not be considered part of the agent’s long term memory to be used during task sessions.
- Context Engineering In this tutorial we have shown context management only in relation to managing context window size. However, context window size is not the only reason we need to manage context. Context engineering is the strategies we use to decide what information our agent needs to do its job well.
Even with today’s large context windows, throwing everything in is not always the best approach.
Irrelevant or poorly organized context can confuse the model, slow things down, and drive up costs. We’ll dive deeper into context engineering strategies in a future tutorial.
What’s next
In the next part of the series we will look at more advanced patterns such as reasoning, planning and multi agent workflows.
We will also start to dive deeper into the practical considerations for deploying real world agents such as observability, evaluating agents, guardrails and security.
Full Source Code Here: Agent Memory Jupyter Notebook