A client calls after a successful POC. “How much does it cost to go to production?” The POC cost $5,000 in consulting and $200 in tokens. Our production estimate: $60,000 the first year.
“How?” Here are the five lines that are never in the initial quote.
1. Real token cost in production
During the POC, you process 100 documents. In production, you process 10,000 a month. With Claude Sonnet at around $3 per million input tokens, an average 5,000-token workflow × 10,000 documents = $150/month. With Opus, multiply by 5. Not catastrophic, but to anticipate.
2. Observability and logging
In production, you need to know what’s happening. How many calls, which prompts, which failures, what cost per client. Helicone, LangSmith, or a homegrown system: count $100 to $500/month depending on volume. Without it, you’re flying blind.
3. Prompt maintenance
Prompts that work in November break in March. New document types, model drift, edge cases that appear. Count 2 to 4 hours of work per month to adjust, so $400 to $800.
4. Ongoing compliance
Privacy law, quarterly audits, policy updates, new employee training. Not glamorous, but unavoidable. Plan for 1 consultant day per quarter, so $4,000 to $6,000 per year.
5. Team training and support
Teams using the tool need to learn to debug, know when to ignore the AI suggestion, how to escalate. Initial training + 4 follow-up sessions: $8,000 to $12,000.
The realistic total
Tokens: $2,000/year. Observability: $4,000/year. Maintenance: $6,000/year. Compliance: $5,000/year. Training: $10,000/year. Plus a 20% margin for the unexpected: you’re at around $32,000 for year one, and that’s only if everything goes well. Double it if the project is complex.
The POC is an investment. Production is a recurring expense. Budget both from the start, or get ready to explain the surprise to management next quarter.