01 The problem
Chat exports are memoirs. You cannot pipe one into an LLM and ask for a portfolio page.
I had three projects I wanted on this site. I also had about a year of Claude chat history to draw from. The natural move was to point a model at the whole export and ask for a writeup. I tried it. The first draft referenced my grandmother, the neighborhood I grew up in, a specific employer, and a handful of email threads that had nothing to do with the project.
The issue is structural. A Claude export is a single JSON file containing every conversation you have ever had. Code help sits next to tax questions. A deployment log sits next to a note about a family member in the hospital. The model does not know which parts are safe to republish because nothing in the export tells it.
So I stopped trying to make the generation safer and started building the thing that sits in front of it.
02 The approach
I wanted a single rule at the top of the skill: the raw chat export never leaves the machine. Everything downstream has to be compatible with that.
One layer of defense is a promise. Five layers is a posture. Each layer assumes the one before it missed something.
Layer 1 throws away whole conversations that do not look like project work. Most of the export is not about code. Dropping those chats early is the cheapest and largest win, and it narrows what later passes have to reason about.
Layer 2 runs Presidio and a bank of regexes over what survives. Names, emails, phone numbers, street addresses, API tokens, local filesystem paths. Relational terms too: spouse, grandmother, colleague. Those often matter more than the proper nouns, and named entity recognition misses them.
Layer 3 is the user’s config file. An allowlist for things to preserve verbatim (apazos, Claude, Presidio). A blocklist for things to hard-redact even if the earlier passes missed them. Blocklist wins.
Layer 4 is a human. The CLI produces a report and exits. You re-run with --confirm after you have read it. There is no way to skip this step.
Layer 5 runs the whole entity pass a second time, but on the LLM’s output. Models recombine fragments in surprising ways. This layer catches regressions.
03 The build
Python, Presidio, spaCy’s en_core_web_lg, a YAML config loader, pytest. About a thousand lines once the test suite settled. One CLI binary with two modes: scrub a transcript, or scan an article.
The test suite came before the wiring. I wrote a synthetic chat export with planted PII across every category I could think of, and an expected scrubbed transcript next to it. Every layer got its own test before it was allowed into the pipeline. When a real export later surfaced a category I had not planted, that went into the fixtures and the test ran again.
The hardest bug was not a bug in the code. It was a false sense of safety. The first version of Layer 2 caught proper nouns beautifully and missed relational terms completely. A draft came back clean of names and full of my grandmother, my wife, my boss. The fix was a hard-coded list of relational tokens, each mapped to a role replacement. Cheap. Obvious in hindsight. Named entity recognition is not the same as implicit entity recognition.
The CLI is the shape it is because of Layer 4. Scrubbing produces a report and exits zero without writing a transcript. You read the report. You re-run with --confirm. This is a small amount of friction and it was the right amount. Auto-confirm was on the table. I took it off.
04 The outcome
The skill shipped. It sits in ~/.claude/skills/case-study-builder on my machine. I have used it to generate three drafts so far. Two are already on this site. The third is the page you are reading.
The draft still needs editing. The template sets the shape, redaction keeps it safe, but the voice always needs a human pass. I do not mind. Editing a structured draft is an hour of work. Starting from a blank page and a chat export is a week of work that I would never actually do.
The lesson I take from this one is about where safety lives. The instinct with an LLM is to fix leaks at the prompt. The cheaper move is to fix them at the input boundary, and to make the boundary impossible to skip. The model is not the part of the system you should be asking to be careful.