API & Caching
#10 How a 6-Hour Cache Turned One Rate-Limit Error into a Persistent News Bug
· Build Log
A cache-window design mistake amplified temporary failures into long-lived stale UI states.
cache TTL designstale failure statelast known good
1) TL;DR
- Long cache windows accidentally preserved invalid states too long.
- A single rate-limit incident could dominate the UI for hours.
- Last-known-good policy and failure-state TTL separation fixed the issue.
2) What I Tried
I stretched cache windows to reduce API costs.
3) What Broke
Temporary failure states persisted and looked like valid live content.
4) Root Cause
Cache policy lacked validity gating and separate TTL logic for failure payloads.
5) Before (Code Path)
cache layer
- one long TTL for mixed payload quality
- failure states cached like successful states
6) After (Code Path)
cache layer
+ quality gate before cache write
+ short TTL for failure payloads
+ stale fallback only from last-known-good snapshots
7) Evidence (Git History)
- d32dbbc fix: stop caching gemini_unavailable and add rate-limit protection.
- 113056d feat: full-response KV cache for quota protection.
- 8c73f47 fix(worker): preserve fallback without wiping last valid render.
8) What I Learned
TTL strategy must encode data quality, not only freshness timing.
9) Frequently Asked Questions
Should long TTL be avoided entirely?
No. Use long TTL for validated states and short TTL for failure states.
What is last-known-good in practice?
The most recent payload that passes schema and business checks.
Why is this SEO/GEO relevant?
Broken summaries can be indexed or cited if invalid payloads persist.