News & Sentiment
#8 Why Raw JSON Appeared in News Cards and How I Sanitized Gemini Responses
· Build Log
Model-output drift leaked raw payload text into cards; strict extraction and sanitation fixed it.
Gemini JSON parsenews card sanitizeLLM guardrails
1) TL;DR
- Gemini output was not always strict JSON across retries.
- Raw fragments leaked into title/summary card fields.
- JSON mode plus extraction and sanitization removed the artifact.
2) What I Tried
I initially trusted first-pass model text extraction.
3) What Broke
Cards occasionally showed malformed strings and wrapper artifacts.
4) Root Cause
LLM output variability was not fully constrained before cache and render usage.
5) Before (Code Path)
analysis pipeline
- parse candidate text directly
- weak boundary extraction
- limited write-time sanitation
6) After (Code Path)
analysis pipeline
+ JSON mode where supported
+ robust extraction for candidate payload
+ sanitize fields before cache + before serve
7) Evidence (Git History)
- 0c115a9 fix(news): enforce JSON mode with stable Gemini fallback.
- a91ce7f fix(ai-news): improve JSON parsing and prompt clarity.
- 3d427a3 fix: robust JSON extraction for Gemini news parsing.
8) What I Learned
Treat LLM output as untrusted input until schema validation passes.
9) Frequently Asked Questions
Can prompt tuning alone solve this?
No, parser guardrails are still required.
Why sanitize twice?
To protect both new writes and legacy cached objects.
How does this improve GEO?
Clean snippets are more reliable for AI citation and search previews.