Fatskills
Practice. Master. Repeat.
Study Guide: AI MCP and Tooling: Agent memory state and context management
Source: https://www.fatskills.com/ai-for-work/chapter/ai-mcp-and-tooling-agent-memory-state-and-context-management

AI MCP and Tooling: Agent memory state and context management

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~6 min read

Agent Memory, State, and Context Management

What This Is Agent memory, state, and context management refer to how AI systems (like LLMs or autonomous agents) retain, update, and use information across interactions to perform tasks consistently. In real work, this matters because most AI tools aren’t stateless—they need to "remember" past inputs, user preferences, or workflow steps to avoid repeating work, hallucinating, or breaking multi-step processes. Example: A customer support agent that tracks a user’s past complaints and product history to resolve issues faster, without asking the same questions repeatedly.


Key Facts & Principles

  • State: The current snapshot of an agent’s knowledge (e.g., conversation history, variables, or task progress). Example: A coding assistant tracking which files you’ve edited in a session to suggest relevant fixes.
  • Short-term memory (STM): Temporary storage (e.g., a chatbot’s conversation buffer) that resets after a session. Example: A Slack bot remembering your last 10 messages to answer follow-ups coherently.
  • Long-term memory (LTM): Persistent storage (e.g., vector databases, SQL tables) for facts, user profiles, or past actions. Example: A sales agent pulling up a client’s purchase history from a CRM to personalize recommendations.
  • Context window: The maximum amount of text (in tokens) an agent can process at once. Example: GPT-4’s 128k-token window lets it analyze long documents, but exceeding it truncates older context.
  • Retrieval-Augmented Generation (RAG): Combining LTM (e.g., a knowledge base) with real-time reasoning to ground responses. Example: A legal assistant pulling clauses from a contract database to draft a compliant email.
  • State drift: When an agent’s internal state diverges from reality (e.g., outdated data or misaligned variables). Example: A scheduling bot double-booking a meeting because it didn’t sync with a calendar update.
  • Session isolation: Preventing data leakage between users or tasks (e.g., a healthcare agent keeping patient records separate). Example: A therapy chatbot ensuring User A’s notes aren’t visible to User B.
  • Token budgeting: Allocating context window space efficiently (e.g., prioritizing recent messages over old ones). Example: A summarization tool truncating a 20-page report to fit the last 5 pages in the context window.
  • State serialization: Converting an agent’s state into a storable format (e.g., JSON, database rows) for persistence. Example: Saving a user’s preferences (e.g., "use metric units") to reload in future sessions.
  • Feedback loops: Using user corrections or system logs to update state (e.g., learning from mistakes). Example: A writing assistant adjusting its tone suggestions based on user edits.

Step-by-Step Application

  1. Define your state schema
  2. List what the agent must remember (e.g., user ID, task progress, preferences).
  3. Example: For a project management agent, track: json { "user_id": "u123", "current_task": "draft_proposal", "dependencies": ["client_feedback"], "preferences": {"format": "bullet_points"} }

  4. Choose a memory backend

  5. For STM: Use in-memory stores (e.g., Redis, session cookies) or the agent’s context window.
  6. For LTM: Use vector DBs (e.g., Pinecone, Weaviate) for unstructured data or SQL/NoSQL for structured data.
  7. Example: Store user preferences in PostgreSQL; store past conversations in a vector DB for semantic search.

  8. Implement context management

  9. For short interactions: Pass the full conversation history in the prompt (e.g., messages: [user_msg1, agent_reply1, ...]).
  10. For long interactions: Use RAG to fetch only relevant LTM (e.g., "Retrieve the last 3 user messages + relevant docs").
  11. Example: A support agent truncates chat history to the last 5 exchanges but pulls the user’s past tickets via RAG.

  12. Handle state updates

  13. Use explicit commands (e.g., UPDATE user_preferences SET theme = 'dark') or event-driven triggers (e.g., "When user says ‘forget this,’ delete the last 5 messages").
  14. Example: A coding agent updates its state when you save a file: python if file_saved: state["edited_files"].append(file_path)

  15. Monitor for state drift

  16. Log state changes and compare them to ground truth (e.g., sync with a CRM or calendar).
  17. Example: A scheduling agent checks Google Calendar every 5 minutes to reconcile its internal state.

  18. Optimize token usage

  19. Compress context (e.g., summarize old messages) or use sliding windows (e.g., keep only the last 10 turns).
  20. Example: A meeting assistant condenses a 1-hour transcript into 3 bullet points to fit the context window.

Common Mistakes

  • Mistake: Assuming the agent’s context window is infinite. Correction: Budget tokens by prioritizing recent/relevant data. Why: Exceeding the window truncates older context, breaking coherence.

  • Mistake: Storing sensitive data in plaintext STM (e.g., session cookies). Correction: Encrypt STM or use secure backends (e.g., Redis with TLS). Why: Session data can be intercepted or leaked.

  • Mistake: Ignoring state drift (e.g., not syncing with external systems). Correction: Implement periodic reconciliation (e.g., hourly syncs with a database). Why: Drift causes errors like double-booking or outdated recommendations.

  • Mistake: Overloading LTM with irrelevant data (e.g., storing every user message). Correction: Filter LTM (e.g., store only "important" messages or summaries). Why: Noise degrades retrieval quality and increases costs.

  • Mistake: Not handling session isolation (e.g., sharing state between users). Correction: Use unique session IDs and scope data to users. Why: Violates privacy (e.g., GDPR) and causes data leaks.


Practical Tips

  • Use "memory tiers": Combine STM (fast, temporary) and LTM (slow, persistent) for efficiency. Example: A chatbot uses STM for the current conversation and LTM for user profiles.
  • Log state changes: Track updates to debug drift (e.g., "User changed preference from X to Y at 2:30 PM").
  • Test edge cases: Simulate long conversations or concurrent users to catch state corruption. Example: Run a load test with 100 users to check for session leaks.
  • Leverage frameworks: Use tools like LangChain’s Memory or LlamaIndex’s ChatMemory to abstract state management.

Quick Practice Scenario

Scenario: You’re building a travel planning agent. A user asks, "What’s the weather in Tokyo next week?" The agent checks its state and finds no trip planned for Tokyo. It replies, "I don’t see a Tokyo trip in your plans. Should I add one?" The user says, "Yes, for next Friday."

Question: What 3 state updates should the agent make to handle this correctly?

Answer:
1. Add Tokyo to the user’s destinations (destinations: ["Tokyo"]).
2. Set the trip date (trip_dates: {"Tokyo": "2024-11-01"}).
3. Flag the trip as "needs weather check" (pending_tasks: ["check_weather_Tokyo"]).

Explanation: The agent must persist the new trip, track its date, and queue follow-up actions to avoid dropping the task.


Last-Minute Cram Sheet

  1. State = Agent’s current knowledge (e.g., variables, conversation history).
  2. STM = Temporary memory (e.g., chat history); LTM = Persistent memory (e.g., vector DB).
  3. Context window = Max tokens an agent can process at once (e.g., 128k for GPT-4).
  4. RAG = Combines LTM retrieval with real-time reasoning.
  5. State drift = Agent’s state diverges from reality (e.g., outdated data). Sync with external systems!
  6. Token budgeting = Prioritize recent/relevant data to avoid truncation.
  7. Session isolation = Keep user data separate (e.g., unique session IDs). Never share state between users!
  8. Feedback loops = Update state based on user corrections or logs.
  9. Memory tiers = Use STM for speed, LTM for persistence.
  10. Serialization = Convert state to JSON/SQL for storage. Encrypt sensitive data!