We needed to add a new feature: group discussions (discussionChat).
This should've been simple. We already had interviewChat—one-on-one conversations where users deeply engage with AI-simulated personas. Group discussion was just scaling from 1-to-1 to 1-to-many: 3-8 personas engaging simultaneously, watching perspectives collide and insights emerge.
In theory, we just needed to:
The reality: We had to modify 12 files.
Worse, we discovered this:
Three nearly identical agent wrappers. Every new feature required copy-pasting across all three. Every bug fix meant changing it three times.
That moment, we realized: something was fundamentally wrong.
Not that our code wasn't elegant. Not that we lacked abstraction. But that we were building AI Agent systems with traditional software engineering thinking.
This article chronicles how we escaped this trap—through three architectural evolutions, rethinking how AI Agents should be built from first principles.
Before refactoring, we stopped to ask a fundamental question:
What's the essential difference between AI Agents and traditional software?
Traditional software is built on state machines:
This model's core assumptions:
This works beautifully for traditional software. But for AI Agents?
LLMs don't work this way:
Where's the "state" here?
state fieldThe AI infers from conversation history:
This is a completely different paradigm.
From this observation, we derived three insights that shaped our architectural evolution.
Traditional approach: Maintain explicit state
AI-native approach: Infer state from conversation
Why is conversation superior to state machines?
How humans make decisions:
AI Agents should follow the same pattern:
Why separate?
Facing the "AI forgetfulness" problem, we could:
Option A: Vector DB + Semantic Search
Option B: Markdown Files + Full Loading
We chose Option B.
Why?
From these three insights, we distilled the core principles of our architecture:
1. Messages as Source of Truth
2. Configuration over Code
3. AI as State Manager
4. Simple, Transparent, Controllable
v2.2.0 - 2025-12-27
Initially, research data was scattered across three places:
Generating reports required stitching from three places:
Problems:
interviews.conclusion and interview content in messages could divergediscussionChat requires new table, new tool, new queriesEven worse, tool outputs were inconsistent:
Agents couldn't handle this uniformly, leading to complex code.
Core idea: All research content flows into the message stream. Database only stores derived state.
Key changes:
Removed 5 specialized save tools
saveInterview, saveDiscussion, saveScoutTask, ...Unified tool output format
plainTextGenerate studyLog on demand
Reasoning from first principles:
Conversation as context
LLMs excel at extraction
Shadow of Event Sourcing
Comparison with other approaches:
| Approach | Pros | Cons | Why not chosen |
|---|---|---|---|
| Messages as source | Data consistent, easy to extend | Requires extra LLM call to generate studyLog | ✅ Our choice |
| Traditional state management | Precise control | Complex state sync, hard to trace | Doesn't suit LLM non-determinism |
| Remove DB entirely | Extremely simple | Frontend queries difficult, history hard to manage | Need structured display |
| Event Sourcing | Complete history, replayable | High engineering complexity | Over-engineered for current scale |
Code simplification:
Development efficiency:
Before:
After:
Cost trade-offs:
✅ Benefits:
❌ Costs:
✅ Mitigation:
v2.3.0 - 2026-01-06
After implementing message-driven architecture, adding features became simpler. But user experience wasn't good enough.
When creating research, users often say:
"Want to understand young people's coffee preferences"
This isn't specific enough:
Traditional approach: AI asks multiple questions
Problems:
While adding features became simpler, we discovered a bigger technical debt:
Three nearly identical agent wrappers, totaling 1,211 lines.
Code duplication mainly in:
Every new feature (like webhook integration) required changing all three places.
Our solution has two parts:
A separate agent dedicated to intent clarification:
Workflow:
Key design:
Merge three duplicate agent wrappers into one generic executor:
Agent routing:
Each agent only needs to define configuration:
Reasoning-execution separation rationale:
Matches cognitive model
Single responsibility
Messages as protocol
Unified executor rationale:
Extract, Don't Rebuild
Configuration over Inheritance
Plugin-based Lifecycle
customPrepareStep: dynamic tool controlcustomOnStepFinish: custom post-processingComparison with other approaches:
| Approach | Pros | Cons | Why not chosen |
|---|---|---|---|
| Plan Mode + baseAgentRequest | Remove duplicate code, separate reasoning-execution | One more abstraction layer | ✅ Our choice |
| Continue copy-pasting | Simple and direct | Tech debt accumulates, hard to maintain | Unsustainable long-term |
| Fully generic agent | Least code | Sacrifices specialization and control | Can't handle business differences |
| Microservices split | Independent deployment | Over-engineered, adds ops complexity | Unnecessary at current scale |
Code complexity:
But more importantly:
Development efficiency:
Before:
After:
User experience:
Before:
After:
Intent clarification: 3-5 conversation rounds → 1 confirmation
v2.3.0 - 2026-01-08
With intent clarification and unified architecture, the research workflow was smooth. But long-term users reported a problem:
"Why does the AI ask me what industry I'm in every single time?"
The AI doesn't remember users. Every conversation feels like the first meeting:
Users feel the AI is "forgetful", the experience lacks personalization.
Root cause:
LLMs are stateless. Each conversation:
Although we have historical conversations in the DB:
We need a persistent memory system. But how to design it?
Inspired by Anthropic's CLAUDE.md approach:
We adopted a similar approach but added automatic update mechanisms.
Two-tier architecture:
Core Memory (core)
Working Memory (working)
Two-stage update:
Memory Update Agent (Haiku 4.5):
Memory Reorganize Agent (Sonnet 4.5):
Why Markdown over Vector DB?
Context window is large enough
Simple and transparent
Avoid premature optimization
Comparison with mainstream approaches:
| Approach | Storage | Control | Retrieval | atypica choice rationale |
|---|---|---|---|---|
| Anthropic (CLAUDE.md) | File-based | User-driven | Full loading | ✅ Simple, transparent, effective with large context |
| OpenAI | Vector DB (speculated) | AI + user confirmation | Semantic retrieval | ❌ Black box, weak user control |
| Mem0 | Vector + Graph + KV | AI-driven | Hybrid retrieval | ❌ Over-engineered, high maintenance cost |
| MemGPT | OS-inspired tiered | AI self-managed | Tiered retrieval | ❌ Conceptually complex, utility unproven |
We chose Anthropic's simple approach because:
User experience:
Before:
After:
System cost:
Response time:
Low cost, fast response, completely acceptable.
Now let's step back and see how atypica's architecture differs from mainstream AI Agent frameworks.
| atypica | LangChain | Core Difference |
|---|---|---|
| Messages as source | ConversationBufferMemory | We believe conversation history is the best state |
| Generate studyLog on demand | Pre-compute summary | Avoid sync issues, traceable on failures |
| DB stores derived state | DB stores core state | Similar to Event Sourcing |
Why different?
LangChain's design is influenced by traditional software, believing "state should be explicitly stored and managed."
We believe, for LLMs:
| atypica | LangGraph | Core Difference |
|---|---|---|
| Configuration-driven | Graph-driven | We use configuration to express differences, code for commonalities |
| Single executor | Node orchestration | Avoid over-abstraction, good enough is enough |
| Messages as protocol | Explicit node communication | Loosely coupled without losing context |
Why different?
LangGraph pursues generality, using graph orchestration to express arbitrarily complex flows.
We believe, for our scenarios:
| atypica | Mem0 | Core Difference |
|---|---|---|
| Markdown files | Vector + Graph + KV | We choose simple and transparent over precise and complex |
| Full loading | Semantic retrieval | When context window is large enough, full text is better |
| User-editable | AI black box | User trust comes from transparency |
Why different?
Mem0 pursues precise retrieval, using multiple databases in hybrid.
We believe, for personal assistants:
atypica's choices:
Mainstream frameworks' choices:
Who's right or wrong?
Neither is wrong. It's just:
Specific impact from three evolutions:
| Task | Before | After | Improvement |
|---|---|---|---|
| Add new research method | 12 files, 2-3 days | 3 files, 2-3 hours | 10x |
| Add new capability (MCP) | Modify 3 places, 1 day | Modify 1 place, 2 hours | 4x |
| Fix bug | Change 3 agents | Change 1 base | 3x |
Cost and performance impact negligible.
What did we learn from three evolutions?
1. Incremental refactoring, not big bang
We didn't rewrite the entire system at once. Three evolutions, each step:
analyst.studySummary field)This let us quickly validate ideas and reduce risk.
2. Start from real pain points
Don't pursue architectural perfection, instead:
discussionChat was too complexLet problems drive design, not design drive problems.
3. Embrace LLM characteristics
Don't treat LLMs as traditional software:
Adapt to LLM's capability boundaries, rather than fighting them.
1. Learning curve for abstraction layer
baseAgentRequest requires understanding to modify:
customPrepareStep and customOnStepFinishBut: clear interfaces and documentation lowered the barrier.
2. Cost of on-demand generation
studyLog generation requires LLM call (~$0.002/time).
But:
3. Limitations of simple solutions
Markdown memory isn't suitable for:
But:
1. Confidence from type safety
During refactoring, the compiler catches 99% of issues.
2. Flexibility of configuration-driven
Adding webhook integration only requires:
All agents automatically gain new capability, no config changes needed.
3. Power of messages as protocol
Plan Mode and Study Agent communicate through messages:
This was an unexpected benefit.
Three evolutions brought atypica closer to general-purpose agents. But there's more to do.
1. Skills Library
2. Multi-Agent Collaboration
3. Evolve toward GEA
4. Self-Improving Agents
No matter how we evolve, we stick to:
Building AI Agent systems is not a simple extension of traditional software engineering.
We need to rethink:
atypica's three evolutions are essentially three cognitive upgrades:
From database thinking → data flow thinking
From code reuse → configuration-driven
From stateless → memory-enhanced
These choices may not be the most "advanced."
But they are:
And this, perhaps, is the key to building reliable AI systems.
prisma/schema.prisma # New Discussion tablesrc/ai/tools/discussionChat.ts # New toolsrc/ai/tools/saveDiscussion.ts # Save toolsrc/app/(study)/agents/studyAgent.ts # Add tool to agentsrc/app/(study)/agents/fastInsightAgent.ts # Add againsrc/app/(study)/agents/productRnDAgent.ts # And again... 6 more files// studyAgentRequest.ts (493 lines)export async function studyAgentRequest(context) { const result = await streamText({ model: llm("claude-sonnet-4"), system: studySystem(), messages, tools: { webSearch, interview, scoutTask, saveAnalyst, generateReport // ... 15 tools }, onStepFinish: async (step) => { // Save messages // Track tokens // Send notifications // ... 120 lines of logic } });}// fastInsightAgentRequest.ts (416 lines)// 95% identical code// productRnDAgentRequest.ts (302 lines)// 95% identical codeclass ResearchSession { state: 'IDLE' | 'PLANNING' | 'RESEARCHING' | 'REPORTING'; data: { interviews: Interview[]; findings: Finding[]; reports: Report[]; }; transition(event: Event) { switch (this.state) { case 'IDLE': if (event.type === 'START') this.state = 'PLANNING'; break; case 'PLANNING': if (event.type === 'PLAN_COMPLETE') this.state = 'RESEARCHING'; break; // ... more state transitions } }}const messages = [ { role: 'user', content: 'Want to understand young people's coffee preferences' }, { role: 'assistant', content: 'I can help you conduct user research...' }, { role: 'assistant', toolCalls: [{ name: 'scoutTask', args: {...} }] }, { role: 'tool', content: 'Observed 5 user segments...' }, { role: 'assistant', content: 'Based on observations, I suggest interviewing 18-25 coffee enthusiasts...' }, { role: 'assistant', toolCalls: [{ name: 'interviewChat', args: {...} }] }, // ...];// ❌ Traditional: Explicit state managementinterface ResearchState { stage: 'planning' | 'researching' | 'reporting'; completedInterviews: number; pendingTasks: Task[];}// Need synchronization: state and conversation history can diverge// ✅ AI-native: Conversation is stateconst messages = [...conversationHistory];// AI infers state from history, no explicit sync neededconst result = await streamText({ messages, // AI knows what to do});// Plan Mode: Understanding intent"User says: want to understand young people's coffee preferences" → Analyze: needs qualitative research → Decide: use group discussion method → Output: complete research plan// Study Agent: Executing plan"Received research plan" → Call discussionChat → Analyze discussion results → Generate insights report// Precise matching of relevant memoriesconst query_embedding = await embed(user_message);const relevant_memories = await vectorDB.search(query_embedding, top_k=5);// Simple and transparentconst memory = await readFile(`memories/${userId}.md`);const messages = [ { role: 'user', content: `<UserMemory>\n${memory}\n</UserMemory>` }, ...conversationMessages];// Place 1: analyst tableconst analyst = await prisma.analyst.findUnique({ where: { id }});console.log(analyst.studySummary); // "Research summary..."// Place 2: interviews tableconst interviews = await prisma.interview.findMany({ where: { analystId: id }});console.log(interviews.map(i => i.conclusion)); // ["Interview 1 conclusion", "Interview 2 conclusion"]// Place 3: messages tableconst messages = await prisma.chatMessage.findMany({ where: { userChatId }});// webSearch results are hereasync function generateReport(analystId) { const analyst = await prisma.analyst.findUnique({ where: { id: analystId }, include: { interviews: true } // JOIN! }); const messages = await prisma.chatMessage.findMany({ where: { userChatId: analyst.studyUserChatId } }); // Stitch data together const reportData = { summary: analyst.studySummary, // from analyst table interviewInsights: analyst.interviews.map(...), // from interviews table webResearch: extractFromMessages(messages) // from messages table };}// interviewChat: content in DB, returns reference{ toolName: 'interviewChat', output: { interviewId: 123 } // Need another DB query}// scoutTaskChat: content in return value{ toolName: 'scoutTaskChat', output: { plainText: "Observation results...", // Content directly returned insights: [...] }}// ✅ New architecture: Unified output formatinterface ResearchToolResult { plainText: string; // Human-readable summary, required [key: string]: any; // Optional structured data}// interviewChat also returns plainText{ toolName: 'interviewChat', output: { plainText: "Interview summary: User Zhang San mentioned...", // ← Full content here interviewId: 123 // Optional: DB reference }}// Don't pre-save, generate when neededif (!analyst.studyLog) { const messages = await loadMessages(studyUserChatId); const studyLog = await generateStudyLog(messages); // ← Generate from messages await prisma.analyst.update({ where: { id }, data: { studyLog } });}Deleted files:- src/ai/tools/saveInterview.ts- src/ai/tools/saveDiscussion.ts- src/ai/tools/saveScoutTask.ts- src/ai/tools/savePersona.ts- src/ai/tools/saveWebSearch.tsSimplified files (28):- Agent configs no longer need save tools- generateReport doesn't need multi-table JOINsAdding discussionChat:1. Create Discussion table2. Write discussionChat tool3. Write saveDiscussion tool4. Add both tools to 3 agents5. Write discussion query logic6. Modify generateReport queryTotal: 12 files, 2-3 daysAdding discussionChat:1. Write discussionChat tool (returns plainText)2. Add tool to agent config3. generateReport auto-supports (reads from messages)Total: 3 files, 2-3 hoursAI: "Which age group do you want to research?"User: "18-25 I guess"AI: "What method? Interviews or surveys?"User: "Interviews"AI: "How many people?"User: "Around 10"$ wc -l src/app/(study)/agents/*AgentRequest.ts493 studyAgentRequest.ts416 fastInsightAgentRequest.ts302 productRnDAgentRequest.ts// src/app/(study)/agents/configs/planModeAgentConfig.tsexport async function createPlanModeAgentConfig() { return { model: "claude-sonnet-4-5", systemPrompt: planModeSystem({ locale }), tools: { requestInteraction, // Interact with user makeStudyPlan, // Display complete plan, one-click confirm }, maxSteps: 5, // Max 5 steps to complete clarification };}// src/app/(study)/agents/baseAgentRequest.ts (577 lines)interface AgentRequestConfig<TOOLS extends ToolSet> { model: LLMModelName; systemPrompt: string; tools: TOOLS; maxSteps?: number; specialHandlers?: { // Dynamically control which tools are available customPrepareStep?: (options) => { messages, activeTools?: (keyof TOOLS)[] }; // Custom post-processing logic customOnStepFinish?: (step, context) => Promise<void>; };}async function executeBaseAgentRequest<TOOLS>( baseContext: BaseAgentContext, config: AgentRequestConfig<TOOLS>, streamWriter: UIMessageStreamWriter) { // Phase 1: Initialization // Phase 2: Prepare Messages // Phase 3: Universal Attachment Processing // Phase 4: Universal MCP and Team System Prompt // Phase 5: Load Memory and Inject into Context // Phase 6: Main Streaming Loop // Phase 7: Universal Notifications}// src/app/(study)/api/chat/route.tsif (!analyst.kind) { // Plan Mode - intent clarification const config = await createPlanModeAgentConfig(agentContext); await executeBaseAgentRequest(agentContext, config, streamWriter);} else if (analyst.kind === AnalystKind.productRnD) { // Product R&D Agent const config = await createProductRnDAgentConfig(agentContext); await executeBaseAgentRequest(agentContext, config, streamWriter);} else { // Study Agent (comprehensive research, fast insights, testing, creative, etc.) const config = await createStudyAgentConfig(agentContext); await executeBaseAgentRequest(agentContext, config, streamWriter);}// src/app/(study)/agents/configs/studyAgentConfig.tsexport async function createStudyAgentConfig(params) { return { model: "claude-sonnet-4", systemPrompt: studySystem({ locale }), tools: buildStudyTools(params), // ← Tools this agent needs specialHandlers: { // Custom tool control customPrepareStep: async ({ messages }) => { const toolUseCount = calculateToolUsage(messages); let activeTools = undefined; // After report generation, restrict available tools if ((toolUseCount[ToolName.generateReport] ?? 0) > 0) { activeTools = [ ToolName.generateReport, ToolName.reasoningThinking, ToolName.toolCallError, ]; } return { messages, activeTools }; }, // Custom post-processing customOnStepFinish: async (step) => { // After saving research intent, auto-generate title const saveAnalystTool = findTool(step, ToolName.saveAnalyst); if (saveAnalystTool) { await generateChatTitle(studyUserChatId); } }, }, };}Deleted:- studyAgentRequest.ts (493 lines)- fastInsightAgentRequest.ts (416 lines)- productRnDAgentRequest.ts (302 lines)Total: -1,211 linesAdded:+ baseAgentRequest.ts (577 lines)+ planModeAgentConfig.ts (120 lines)+ studyAgentConfig.ts (180 lines)+ productRnDAgentConfig.ts (80 lines)Total: +957 linesNet reduction: -254 linesAdding MCP integration:1. Modify studyAgentRequest.ts2. Modify fastInsightAgentRequest.ts3. Modify productRnDAgentRequest.ts4. Test three agentsTime: 2-3 daysAdding MCP integration:1. Modify baseAgentRequest.ts2. All agents automatically gain new capabilityTime: 2-3 hoursUser: "Want to understand young people's coffee preferences"AI: "Which age group do you want to research?"User: "18-25"AI: "What method do you want to use?"User: "Interviews I guess"AI: "How many people?"... (3-5 conversation rounds)User: "Want to understand young people's coffee preferences"AI displays complete plan:┌─────────────────────────────────────┐│ 【Research Plan】 ││ Goal: Understand 18-25 coffee prefs ││ Method: Group discussion (5-8 ppl) ││ Duration: ~40 minutes ││ Output: Consumer insights report ││ ││ [Confirm Start] [Modify Plan] │└─────────────────────────────────────┘const result = await streamText({ messages: currentConversation, // ← Only current conversation // No context from historical conversations});model Memory { id Int @id @default(autoincrement()) userId Int? // User-level memory teamId Int? // Team-level memory version Int // Version management // Two-tier architecture core String @default("") @db.Text // Core memory (Markdown) working Json @default("[]") // Working memory (JSON, to be consolidated) changeNotes String @db.Text // Update notes @@unique([userId, version]) @@index([userId, version(sort: Desc)])}# User Information- Industry: Consumer goods product manager- Focus: Young consumer preferences, emerging trends# Research Style- Prefers qualitative research (interviews, discussions)- Values authentic user voices over statistics[ { "info": "User recently focused on coffee market", "source": "chat_123" }, { "info": "Prefers group discussion method", "source": "chat_124" }]// src/app/(memory)/actions.tsasync function updateMemory({ userId, conversationContext }) { let memory = await loadLatestMemory(userId); // Step 1: Reorganize when threshold exceeded (Claude Sonnet 4.5) if (memory.core.length > 8000 || memory.working.length > 20) { memory = await reorganizeMemory(memory, conversationContext); } // Step 2: Extract new information (Claude Haiku 4.5) const newInfo = await extractMemoryUpdate(memory.core, conversationContext); if (newInfo) { // Step 3: Insert new information at specified location await insertMemoryInfo(memory, newInfo); }}// src/app/(study)/agents/baseAgentRequest.ts// Phase 5: Load Memoryconst memory = await loadUserMemory(userId);if (memory?.core) { // Inject at conversation start modelMessages = [ { role: 'user', content: `<UserMemory>\n${memory.core}\n</UserMemory>` }, ...modelMessages ];}// Phase 6: Streamingconst result = await streamText({ messages: modelMessages, // ← Includes user memory // ...});// Phase 7: Non-blocking memory updatewaitUntil( updateMemory({ userId, conversationContext: messages }));First conversation:User: "Want to do coffee research"AI: "What industry are you in?"User: "Consumer goods"AI: "What dimensions do you care about?"...Second conversation (a week later):User: "Want to do tea beverage research"AI: "What industry are you in?" # ← Asks againFirst conversation:User: "Want to do coffee research"AI: "What industry are you in?"User: "Consumer goods product manager"# AI remembersSecond conversation (a week later):User: "Want to do tea beverage research"AI: "Based on your background as a consumer goods PM, I suggest..." # ← Remembers!Memory Update (per conversation):- Model: Claude Haiku 4.5- Tokens: ~5K- Cost: ~$0.001Memory Reorganize (every 20 conversations):- Model: Claude Sonnet 4.5- Tokens: ~15K- Cost: ~$0.02Average cost: ~$0.002/conversationMemory loading: +50ms (non-blocking)Memory update: background, doesn't affect responseDuplicate code:Before: 1,211 lines (three agent wrappers)After: 0 linesReduction: 100%Total lines of code:Before: 1,211 lines (duplicates) + othersAfter: 577 lines (base) + 380 lines (configs) = 957 linesNet reduction: 254 lines (21%)Cyclomatic Complexity (code complexity metric):Before: avg 12.3After: avg 6.7Reduction: 45%Token consumption (with prompt cache):- studyLog generation: ~2K tokens (~$0.002)- Memory update: ~5K tokens (~$0.005)- Average per conversation: +$0.007Response time:- Memory loading: +50ms (non-blocking)- Plan Mode: +2s (one-time)- studyLog generation: background, doesn't affect responseIntent clarification:Before: average 3.2 conversation roundsAfter: 1 plan display + 1 confirmationImprovement: 3x efficiencyAI "memory":Before: repetitive questions every conversationAfter: auto-load user preferencesImprovement: personalized experienceResearch startup time:Before: ~5 minutes (multiple rounds of clarification)After: ~1 minute (one-click confirm)Improvement: 5x efficiency// Fully type-safe tool handlingconst tool = step.toolResults.find( t => !t.dynamic && t.toolName === ToolName.generateReport) as StaticToolResult<Pick<StudyToolSet, ToolName.generateReport>>;if (tool?.output) { const token = tool.output.reportToken; // ← TypeScript knows this field exists}// baseAgentRequest.tsif (webhookUrl) { await sendWebhook(webhookUrl, step);}