A skill that writes case studies about the things I build, without leaking my private life.

Every Claude chat export is also a memoir. Taxes, family, the names of people who are not mine to publish. I built a skill that reads those exports, scrubs them locally, and turns what remains into a portfolio article. This page is one of its outputs.

Role
Sole operatordesign · build · ship
Stack
Python + PresidioClaude Skills
Scope
Personal toolinglocal-only pipeline
Org
apazos.comfor this portfolio
Shipped
Q2 2026installed · in use
redact.py --reportlocal · offline
## Redaction report   /  case-study-builder
 
Inputclaude-export.json (412 conversations)
Layer 1kept 38 ·  dropped 374 (taxes, family, health, travel…)
Layer 21,246 entities scrubbed
PERSON 318EMAIL 47PHONE 12ADDRESS 9TOKEN 3PATH 411REL 84
Layer 3allowlist (apazos, Claude, Presidio, Next.js) ·  blocklist held
Layer 5output scan  clean
 

01 The problem

Chat exports are memoirs. You cannot pipe one into an LLM and ask for a portfolio page.

I had three projects I wanted on this site. I also had about a year of Claude chat history to draw from. The natural move was to point a model at the whole export and ask for a writeup. I tried it. The first draft referenced my grandmother, the neighborhood I grew up in, a specific employer, and a handful of email threads that had nothing to do with the project.

The issue is structural. A Claude export is a single JSON file containing every conversation you have ever had. Code help sits next to tax questions. A deployment log sits next to a note about a family member in the hospital. The model does not know which parts are safe to republish because nothing in the export tells it.

The problem was never generation. It was deciding what the model was allowed to see in the first place.
— the realization

So I stopped trying to make the generation safer and started building the thing that sits in front of it.

claudeexport.jsonRAW · ON DISKLAYER 01Structuralfilterdrop non-project chatsat the conversation levelLAYER 02EntityscrubberPresidio + regexnames, paths, tokensLAYER 03Userconfigallowlist + blocklistblocklist winsLAYER 04Reviewgatehuman must run--confirm to proceedLAYER 05Outputscanre-run rules on draftexit non-zero on hitLOCAL ONLY · NO NETWORK→ SCRUBBED DRAFT
Fig. 01Five layers, each a different class of defense. The raw export never crosses the dashed boundary.

02 The approach

I wanted a single rule at the top of the skill: the raw chat export never leaves the machine. Everything downstream has to be compatible with that.

One layer of defense is a promise. Five layers is a posture. Each layer assumes the one before it missed something.

Layer 1 throws away whole conversations that do not look like project work. Most of the export is not about code. Dropping those chats early is the cheapest and largest win, and it narrows what later passes have to reason about.

Layer 2 runs Presidio and a bank of regexes over what survives. Names, emails, phone numbers, street addresses, API tokens, local filesystem paths. Relational terms too: spouse, grandmother, colleague. Those often matter more than the proper nouns, and named entity recognition misses them.

Layer 3 is the user’s config file. An allowlist for things to preserve verbatim (apazos, Claude, Presidio). A blocklist for things to hard-redact even if the earlier passes missed them. Blocklist wins.

Layer 4 is a human. The CLI produces a report and exits. You re-run with --confirm after you have read it. There is no way to skip this step.

Layer 5 runs the whole entity pass a second time, but on the LLM’s output. Models recombine fragments in surprising ways. This layer catches regressions.

Raw · conversation 0241
user: my grandmother used to grow basil in her garden in
Almería, this project is partly for her. repo is at
github.com/apazos1985/case-study-builder, key
sk-ant-api03-aB7Kx… in C:\Users\andre\.env
ping me at aspazos1@gmail.com
Scrubbed · safe to send
user: [FAMILY_MEMBER] used to grow basil in her garden in
[CITY], this project is partly for her. repo is at
github.com/[GITHUB_USER]/case-study-builder, key
[ANTHROPIC_KEY] in [LOCAL_PATH]\.env
ping me at [EMAIL]
Fig. 02Left: one paragraph of a real conversation. Right: the same paragraph after Layers 1 through 3. The story still reads.

03 The build

Python, Presidio, spaCy’s en_core_web_lg, a YAML config loader, pytest. About a thousand lines once the test suite settled. One CLI binary with two modes: scrub a transcript, or scan an article.

The test suite came before the wiring. I wrote a synthetic chat export with planted PII across every category I could think of, and an expected scrubbed transcript next to it. Every layer got its own test before it was allowed into the pipeline. When a real export later surfaced a category I had not planted, that went into the fixtures and the test ran again.

The hardest bug was not a bug in the code. It was a false sense of safety. The first version of Layer 2 caught proper nouns beautifully and missed relational terms completely. A draft came back clean of names and full of my grandmother, my wife, my boss. The fix was a hard-coded list of relational tokens, each mapped to a role replacement. Cheap. Obvious in hindsight. Named entity recognition is not the same as implicit entity recognition.

The CLI is the shape it is because of Layer 4. Scrubbing produces a report and exits zero without writing a transcript. You read the report. You re-run with --confirm. This is a small amount of friction and it was the right amount. Auto-confirm was on the table. I took it off.

04 The outcome

The skill shipped. It sits in ~/.claude/skills/case-study-builder on my machine. I have used it to generate three drafts so far. Two are already on this site. The third is the page you are reading.

The draft still needs editing. The template sets the shape, redaction keeps it safe, but the voice always needs a human pass. I do not mind. Editing a structured draft is an hour of work. Starting from a blank page and a chat export is a week of work that I would never actually do.

Case studies drafted
3
Two live, one is this page. Each under an hour to edit.
PII categories covered
11
Names, emails, phones, addresses, tokens, paths, relations, URLs, IPs, orgs, locations.
LLM calls with raw data
0
The raw export never crosses the local boundary. Enforced by the CLI, not by discipline.
Layers of defense
5
Each layer assumes the one before it missed something.

The lesson I take from this one is about where safety lives. The instinct with an LLM is to fix leaks at the prompt. The cheaper move is to fix them at the input boundary, and to make the boundary impossible to skip. The model is not the part of the system you should be asking to be careful.

Next case