Klem HQ · AI Workflow Roadmap · v1

The Inbox Zero Maintenance Roadmap

A five-phase plan for knowledge workers buried under more than a hundred emails a day. Sort the inbound before you read it, batch your responses, and stop confusing presence in the inbox with progress.

For knowledge workers and operators whose inbox is the bottleneck Reading time 16 minutes Phases 5 Verified against tool versions current as of 2026-05-15
A chaotic inbox sorted by AI into three lanes — act, archive, defer An abstract flow from a tilted stack of unread email through an AI categorisation step into three labeled outbound lanes: act on now, archive immediately, defer to a later batch. unread · scattered AI categorise intent · suggested action single AI step Act · in this batch Archive · no reply needed Defer · later batch or task
Three lanes, two review windows, one calm system.
Roadmap overview · five phases
All five phases at a glance 1 Audit 2 Filters 3 AI categorise 4 Batched review 5 Maintain

Before you start

If you receive more than a hundred emails a day and you have stopped expecting to be at zero, you are not lazy and you have not failed at time management. You have outgrown the assumption that an inbox is a list of tasks. The fix is not to read faster — it is to sort the inbound before you touch it, draft replies in batches, and keep yourself in the loop only where your judgement matters. This roadmap walks the five phases that get you there, with the AI doing the parts machines are good at and you doing the parts you are still better at.

What you will have at the end

  • A deterministic filter layer that quietly archives or deletes obvious noise before you ever see it.
  • Every remaining email tagged within seconds by intent — act, FYI, or wait.
  • Two daily review windows that process the inbox down to zero in under thirty minutes combined.
  • An honest weekly check that tells you when the system is drifting before it stops working.

What you need before phase 1

  • A working {your mail client} with filter or rule support (Gmail, Outlook, Fastmail, Superhuman, etc.) and at least 30 days of inbox history searchable.
  • A paid {your AI assistant} account (ChatGPT, Claude, or similar) with API access or a built-in mail integration.
  • A {your task manager} you already use (Todoist, Things, Notion, a paper list — anything you check at least daily).
  • Roughly four to five hours across one week. Two hours for the audit, the rest spread over the first three days of running the system.
Tool architecture for AI-assisted inbox triage Mail flows from your mail client through a deterministic filter layer into your AI assistant, then splits into your task manager and your calendar where required. {your mail client} Gmail · Outlook · Fastmail native filters filter layer archive · delete · skip API call {your AI assistant} categorise · suggest action {your task manager} deferred actions {your calendar} review windows · meetings
Four surfaces, one categorisation step — the filter layer kills the obvious before the AI is asked.
Before

Always-on inbox, no working memory

187 unread
average open count, end of week
  • Priorities reshuffle with every new arrival.
  • Newsletters, billing alerts, and real questions share the same lane.
  • Important threads slip past three notifications and re-surface late.
After

Two batches, three lanes, no surprises

0 unread
end of each batched review window
  • Noise is archived before you see it.
  • Remaining mail is pre-tagged by intent.
  • You read mail twice a day — and only twice a day.

Illustrative range, not benchmark — your numbers will vary by role, subscription mix, and how much of your work happens in mail versus elsewhere.

The roadmap

Five phases, sequenced so that each one is shippable on its own. If you stop after phase 2, the deterministic filter layer alone will quietly reduce daily inbound by 30 to 50 percent. Phases 3 through 5 layer in AI categorisation, batched review, and ongoing hygiene. The decision gates between phases are real stop points — if a phase is not working, do not paper over it with the next.

1
Phase 01

Audit the last 30 days by intent

Before any rule, any filter, any AI prompt — you need to know what your inbox actually is. Most people discover their real inbound looks nothing like their assumptions about it, and that two or three categories account for the majority of volume.

Phase 1 flow — sample, cluster, label, store Thirty days of inbox is sampled, the AI clusters subjects and senders, a human relabels them by intent, and the result is stored as a reference document. 30-day samplesender + subject AI clustercandidate buckets human labelact · FYI · noise · external audit doccategories + share of volume
Sample → cluster → label → store

Open {your mail client} and pull a representative slice — the last 30 days is usually enough, longer if your role is highly seasonal. Export sender, subject, and arrival timestamp into a spreadsheet, or paste the list directly into a chat with {your AI assistant}. Ask it to cluster the messages into candidate buckets based on sender pattern, subject pattern, and apparent intent. Do not let the AI name the buckets in its own register — give it your four-category target list up front: act (something is required of you), FYI (you should know but no action is needed), noise (automated mail, newsletters, marketing, status pings that do not require eyes), and external (anything from outside your company that needs human-quality attention).

Walk through the clusters one at a time. For each cluster, write down two facts: the share of total inbound it represents (rough percentage is fine) and one or two example senders. By the end of an hour you will have a single-page document that names every meaningful category in your inbox, with its share and its examples. This document is the source of truth for everything that follows. If a filter in phase 2 misclassifies mail, or a phase 3 prompt produces the wrong tag, the cause is almost always that the audit doc is incomplete or muddled.

Resist the urge to start fixing things as you go. The audit is for understanding. Fixing happens in phase 2.

Tools you will use
  • {your mail client} — for the 30-day export or in-app search.
  • {your AI assistant} — for clustering and surface-level pattern analysis.
  • A spreadsheet or notes page — for the final audit document.
Time + cost estimate
60 to 90 minutes. No additional cost beyond your existing mail and AI subscription.
What you ship at the end
A one-page audit document listing every meaningful category in your inbox, with its share of total inbound and two example senders per category. The four headline buckets — act, FYI, noise, external — appear at the top with sub-categories listed underneath where useful.
Common failure modes
  • Over-slicing the categories. Twenty sub-categories is not an audit, it is a procrastination. If a cluster has fewer than three examples in 30 days, merge it. It is a one-off, not a category.
  • Calling noise "FYI". A weekly status digest you have not opened in three months is noise, not FYI. The honest test: if it stopped arriving tomorrow, would you notice within a week? If not, it is noise.
  • Treating senders and subjects as the same signal. A newsletter from a vendor is noise. A real reply from a human inside that vendor is external. Filter on both axes, not just sender domain.
  • Skipping the audit because "I know my inbox". You almost certainly do not. Founders who skip this phase tend to discover in phase 3 that the AI is correctly categorising mail they did not realise they were receiving.
Decision gate before phase 2 Read the audit document one day later, cold. If you can predict what each category contains from the name alone, and the percentages still feel right, proceed. If a category name is ambiguous or two categories overlap, rewrite — do not start building filters on top of muddled groundwork.
2
Phase 02

Build deterministic filters before any AI

The cheapest, fastest, most reliable categoriser in any inbox is a deterministic rule. Before you spend a single token on an AI categorisation, set up rules in {your mail client} that handle the unambiguous cases — automated alerts, vendor newsletters, calendar notifications, billing receipts. Done well, this kills 30 to 50 percent of your daily inbound without any LLM cost.

Phase 2 flow — match rule, route, archive or skip-inbox Inbound mail is checked against native rules; matches are routed to label folders and either archived or skipped from the inbox; non-matches fall through to the AI step in phase 3. new emailarrives rule match?sender · subject · header label + skip inboxnoise · FYI · receipt fall through to phase 3
Match → label → skip inbox · or fall through

Start with the categories from your audit doc that are obviously deterministic. Newsletters almost always come from a predictable sender list and contain "unsubscribe" in the footer. Calendar invites carry a specific MIME header. Billing receipts come from a small set of known domains. CI alerts come from your own infrastructure. For each of these, write a rule in {your mail client} that does exactly one thing: applies a label and skips the inbox. Do not delete on the first pass — labels-and-archive is reversible, deletion is not. You will want a way to audit what the filter is catching for the first two weeks.

Treat the filter set as a small system, not a sprawl. Eight to fifteen rules is normal for a typical knowledge-worker inbox; fifty is a sign you are slicing too thinly or trying to handle ambiguity with rules. Anything that cannot be matched on sender, exact subject pattern, or a single header field is not a deterministic case — it belongs to phase 3.

After the rules are in place, leave them for a full work week before tuning. The instinct to over-iterate in the first 48 hours destroys more good filter sets than any other failure mode. Watch where the filters miss; do not re-tune what catches correctly.

Tools you will use
  • {your mail client} native rules — Gmail filters, Outlook rules, Fastmail sieve scripts, etc.
  • The audit document from phase 1 — as the source list of categories you are filtering on.
  • Optional: a notes page logging which sender domain caused which rule, so future you remembers why the rule exists.
Time + cost estimate
45 to 90 minutes to set up. No ongoing cost — deterministic rules run natively inside your mail client.
What you ship at the end
A filter set of 8 to 15 rules that quietly route automated mail, newsletters, receipts, and routine notifications out of the inbox and into labeled folders. Your visible inbox immediately becomes 30 to 50 percent smaller. Mail that needs human judgement is now the only thing competing for your attention.
Common failure modes
  • Deleting instead of archiving. A wrongly-deleted email cannot be retrieved easily. Always archive on the first pass; promote rules to auto-delete only after two weeks of confirmed correctness.
  • Rules on ambiguous subjects. "Quarterly review" could be a calendar invite, a project update, or a vendor pitch. If a rule needs more than one signal to be safe, it is not deterministic — push it to phase 3.
  • Filtering things you actually need. Bank alerts, status-page incidents for tools you depend on, and signed-document notifications often get bucketed with noise. Read the rules' targets weekly for the first month, not just the inbox.
  • Letting the filter set sprawl. If you add a rule per problematic sender, you end up with 200 rules and no understanding. Stop and rebuild around three to five generic patterns instead.
Decision gate before phase 3 Run the filter set for five business days. At day five, open each labeled folder the rules created and scan for false positives. If you find fewer than three mistakes per folder over the week, proceed. If more, rewrite the offending rule before adding any AI on top.
3
Phase 03

Categorise the rest with AI by intent and suggested action

Now the AI handles what the rules cannot: ambiguous mail that needs to be read to be classified. Wire {your AI assistant} to tag every remaining email with two pieces of metadata — an intent label and a suggested action. No drafts yet, no auto-replies — just a labeled, sorted inbox.

Phase 3 flow — single AI call returns intent and suggested action Each non-filtered email is passed to the AI assistant once and returned as an intent label plus a suggested action; both are written back to the mail client as labels or custom fields. unfiltered mailwebhook fires AI categoriseintent + action write labelsback to mail client human audit · 30 emailstarget ≥85% correct
Trigger → AI call → labels back → sample-audit

Connect {your AI assistant} to {your mail client}. The path matters: most modern mail clients now have first-party AI integrations (Gmail with Gemini, Outlook with Copilot, Superhuman AI) or accept third-party automation through Zapier or Make. Use whichever path your team can support without infrastructure work. The integration should be triggered on new mail that survived the phase 2 filter layer, send the first 1,500 characters of the message to the AI, and return two values: an intent label drawn from a closed list, and a suggested action drawn from a separate closed list.

A practical starting taxonomy: intents are action-required, question-for-you, FYI, scheduling, external-business, and uncertain. Suggested actions are reply-now, reply-in-batch, defer-to-task, archive, and escalate-human. Constrain the prompt to pick exactly one from each list, with a fallback of uncertain + escalate-human when nothing fits. The closed-list constraint is what stops the model from inventing fluent-sounding categories that drift from your audit doc.

Run the system for five business days. Each evening, sample five labels at random and check them. Treat any mis-categorisation as either a category problem (return to phase 1 and refine), a prompt problem (tighten the closed list), or a context problem (the email referenced something the AI cannot see — these will always need human-handling).

Tools you will use
  • {your AI assistant} — via API, native mail-client integration, or a Zapier/Make connector.
  • {your mail client} — to receive the labels back as a label, flag, or custom field.
  • A closed-list prompt template — intent list and action list, hard-coded with a fallback bucket.
Time + cost estimate
90 to 150 minutes to set up. Ongoing cost: roughly $0.001 to $0.004 per email categorised on the cheapest current model tier. An inbox of 120 unfiltered emails a day spends under $10 per month.
What you ship at the end
Every email that passed the phase 2 filter arrives in your inbox already tagged with an intent and a suggested action. When you open the inbox at a review window, you see the labels first and the contents second — sort by label, batch by action, work through it in twenty minutes instead of an hour.
Common failure modes
  • The model invents labels. Without an explicit closed list and a fallback bucket, the AI will return fluent-sounding new categories. Constrain the prompt to "Choose exactly one of the following. If none apply, return uncertain." and audit uncertain weekly.
  • Thread-context loss. Long threads carry meaning across many messages. Pass the full thread or at least the last three replies to the AI for any email that arrives mid-thread; otherwise the categoriser will mis-read the intent.
  • Cost runaway. Running every inbound through a frontier model is unnecessary and expensive. Use the cheapest available model that hits 85 percent accuracy on your sample audit — measure first, choose model second.
  • Internal versus external mail mis-blended. Mail from your colleagues uses your team's shorthand; mail from outside does not. If accuracy drops sharply on one or the other, split the prompt by sender domain so each gets its own examples.
Decision gate before phase 4 After five business days, pull 30 randomly-tagged emails and audit them manually. If 26 or more (about 85 percent) carry a correct intent and a sensible suggested action, proceed. If fewer, the prompt or the category list needs another pass — fix it before introducing batched workflow on top of unreliable signal.
4
Phase 04

Move to batched processing with AI-drafted replies

The labeled inbox from phase 3 is now ready to be processed in batches. Stop checking mail continuously. Set two review windows — one in the morning, one near end of day — and process the inbox to zero each time. The AI prepares draft replies for routine threads; you approve, edit, or rewrite before sending.

Phase 4 flow — batched review window with AI-drafted replies At each review window, the AI generates draft replies for routine intents; the human approves, edits, or escalates; sends in batch and clears the inbox. review window09:30 · 16:30 AI draftsroutine intents only human reviewapprove · edit · escalate batch send · zero inboxclose mail until next window
Window → drafts → human → batch send → close

Pick two times. Morning is usually 30 to 45 minutes after you start work — late enough that the morning's inbound has arrived, early enough that you are not already deep in something else. End of day is usually 30 to 60 minutes before you stop. Block these on {your calendar} and protect them. Outside these windows, the mail client is closed. If you find yourself opening it anyway, set a daily browser block or move the client off your primary device for a week to break the habit.

Inside each window, work top-down by intent. Reply-now threads from phase 3 are the only urgent batch — usually a small number. Reply-in-batch threads get an AI-drafted reply that you read, edit if needed, and send. Defer-to-task threads create a task in {your task manager} with a one-line summary and the original email linked — the email itself gets archived immediately. FYI gets scanned and archived. Escalate-human threads stay in the inbox for you to handle without AI help — usually executive correspondence, anything emotionally loaded, anything legally meaningful.

The AI draft step is narrow on purpose. Only generate drafts for reply-in-batch intents where the underlying answer is routine — meeting confirmations, document-link replies, brief status updates, vendor acknowledgements. Anything that requires judgement, position, or persuasion is faster to write yourself than to edit out of a draft. Aim for a drafted-reply edit rate under 30 percent; if you are rewriting more than that, narrow the categories that trigger drafting.

Tools you will use
  • {your AI assistant} — for draft generation, using a short brand-voice or personal-voice document.
  • {your mail client} — to receive drafts as unsent replies, never auto-sent.
  • {your task manager} — for deferred actions extracted from email.
  • {your calendar} — for the two protected review windows.
Time + cost estimate
90 to 120 minutes to set up the draft prompt and calendar blocks. Ongoing cost: roughly $0.005 to $0.02 per drafted reply. Most users settle at $5 to $15 per month combined for phase 3 and phase 4 traffic.
What you ship at the end
A working two-window-a-day rhythm. Inbox is at zero at the end of each window. Routine replies are drafted and sent with light editing. Deferred work lives in your task manager, not in the inbox. Notifications outside review windows are off.
Common failure modes
  • Windows that drift. The morning window slides to noon, then to 2pm. Treat the windows as meetings you cannot cancel — book them on the calendar and decline conflicts that violate them.
  • Drafting for the wrong intents. If you find yourself rewriting more than three in ten drafts, the AI is drafting for categories where the underlying answer is not routine. Remove those categories from the draft list.
  • Tasks bouncing back to email. A deferred task that re-surfaces in the inbox three days later means the task system is failing, not the email system. Fix the task workflow — do not let mail become a backup queue.
  • Auto-send by accident. A single misconfigured integration can send 50 draft replies before you notice. Verify drafts land as unsent replies, never as sent. Test with two real messages before going live.
  • Notification leak. Push notifications, badge counts, and watch alerts undo the entire batch model. Turn them off on every device for the duration of phase 4 — re-enable only specific human senders if you must.
Decision gate before phase 5 Run phase 4 for two weeks. At week two, ask one honest question: did anything important get missed because you were not in the inbox continuously? If the answer is no, proceed. If the answer is yes, identify the specific category that broke the model and either add a real-time escalation rule for it or push it out of email entirely.
5
Phase 05

Maintain on a weekly, monthly, quarterly cadence

By the end of phase 4 you have a working system. The remaining work is to keep it working. Email subscriptions accumulate, senders change roles, AI providers update their models, and your own work mix shifts every quarter. Without explicit maintenance, the system silently degrades over six to twelve months until you are back where you started.

Phase 5 flow — weekly, monthly, and quarterly maintenance loops A weekly hygiene pass, a monthly rule review, and a quarterly subscription audit feed back into the filter set, the AI prompts, and the audit document. weekly hygiene10 minutes Friday monthly rule reviewfilter set + prompt quarterly auditsubscriptions + roles update phase 1 audit docre-baseline if needed
Weekly hygiene → monthly review → quarterly audit → re-baseline

The weekly hygiene pass is ten minutes, ideally Friday afternoon. Open the labeled folders the phase 2 filters route to. Look for two things: messages that landed in the wrong folder (a real human reply caught by a noise rule) and messages that should have been filtered but were not. Adjust one or two rules. Then look at the AI "uncertain" bucket from phase 3 — anything sitting there suggests either a missing category or genuinely unusual mail.

The monthly rule review is 30 minutes. Pull the count of mail caught by each filter rule over the past month. Rules that catch zero mail are dead — delete them. Rules that catch hundreds and never produce false positives can be promoted from archive to auto-delete with confidence. Read five randomly-sampled AI categorisations end-to-end; if the edit rate on AI drafts has climbed above 40 percent for any category, that category needs a tighter prompt or removal from the draft list.

The quarterly subscription audit is 45 to 60 minutes. Open {your mail client} and search for the word "unsubscribe" over the past 90 days. Unsubscribe from anything you have not opened in that window. Re-run the phase 1 clustering exercise against the most recent 30 days — if the category mix has shifted by more than 20 percent, your audit doc is stale and the filter set and prompts need to be re-baselined against the new mix. This is also the right moment to test any new AI model your provider has released; behaviour changes meaningfully across versions in ways release notes rarely capture.

Tools you will use
  • {your mail client} — for rule counts, label folder scans, and unsubscribe searches.
  • {your AI assistant} — for the periodic re-clustering exercise.
  • A running log of changes — anywhere that records dates: a notes file, a single calendar entry, the bottom of the audit doc.
Time + cost estimate
10 minutes weekly, 30 minutes monthly, 45 to 60 minutes quarterly. Roughly 7 hours a year of upkeep on a system that saves multiple hours a week.
What you ship at the end
A maintenance cadence that catches drift before it becomes degradation. The filter set, the AI prompt, and the audit doc all stay current with how your inbox actually looks today, not how it looked a year ago. The system improves marginally with each cycle instead of decaying.
Common failure modes
  • Skipping the weekly pass "just this week". Three skipped weeks become a quarter, and the filter set starts catching the wrong mail. Calendar it as a recurring 10-minute block on Friday and treat it as work.
  • Reacting to the AI provider's model upgrade without re-auditing. A new model can shift accuracy in unpredictable directions. Always re-run the phase 3 audit after switching models, even if release notes claim improvement.
  • Letting subscriptions accumulate. Newsletters double every six months by default. The quarterly unsubscribe pass is the only reliable counterweight — schedule it on the calendar.
  • Updating the system without updating the audit doc. Future you will not remember why a particular filter exists. Every meaningful change goes in the running log, with a date and a one-line reason.
Ongoing decision gate If you skip two consecutive monthly reviews, treat the system as drifting and re-run the phase 1 audit before trusting AI categorisations again. The system is allowed to lose ground; pretending it still works costs more than admitting it does not.

What the AI cannot do

These are specific limits as of 2026-05. Treat them as the failure modes you would otherwise discover at the worst possible moment.

Honest limits

  • It cannot draft nuanced replies in your voice. The AI can produce a fluent reply in a generic professional register. It cannot reliably reproduce the small choices that make a message sound like you specifically — the way you open a difficult conversation, the joke you would never make, the sign-off that signals warmth without saying it. Routine replies are safe to draft; anything that carries personal voice is faster to write than to edit.
  • It cannot judge tone with executives or sensitive counterparties. A two-line email from your CEO, a board member, or a customer's general counsel reads literally to the AI. Subtext, signals of escalation, polite displeasure — none of these are reliably detected. Any thread with someone whose relationship matters to your role should be human-handled, not AI-drafted.
  • It cannot decide when to call instead of email. Some threads are stalling because the underlying issue cannot be resolved in writing. The AI will keep producing plausible replies that prolong the loop. Trust your own instinct: if a thread has been open for three back-and-forths without progress, switch channels.
  • It cannot handle legally-binding correspondence. Contract drafts, signed documents, regulatory notices, anything from a lawyer or a tax authority — the AI will summarise these confidently and incorrectly. Always read the original document yourself. The categoriser is allowed to flag these as escalate-human; nothing more.
  • It cannot remember the thread history reliably across messages. Each AI call starts from what you pass it. Unless you pass the full thread, the model treats each message as standalone — which mis-reads context-laden mid-thread replies. Either pass the full thread or accept that mid-thread mis-categorisation is a known failure mode of the system.
  • It cannot know what you have not told it. Off-thread context — a Slack message your colleague sent, a decision made in a meeting, an unstated policy — is invisible to the model. Drafts produced without that context will sound confidently wrong. When the context that matters is not in the email itself, the AI cannot help.
Decision tree — AI-summarise, AI-reply, or human-handle An incoming email is routed by three sequential questions; failing any question pushes the email toward human handling. Categorised email arrives Routine intent? no yes Personal voice required? yes no Edit-rate < 30%? no yes AI drafts human approves Human handles non-routine Human writes voice-required Roll back drift threshold
Three sequential checks; failing any one keeps a human in the loop.

After you finish

The system needs maintenance, not because AI is fragile but because your role, your network, and the senders who want your attention all change. The cadence below is what holds up over twelve months.

Maintenance cadence

  • Weekly — Ten-minute hygiene pass on Friday. Scan the labeled folders for false positives. Note any pattern in a running log.
  • Monthly — Thirty-minute rule review. Count what each filter rule caught. Delete rules that catch nothing. Tighten or remove draft categories where edit rate has climbed above 40 percent.
  • Quarterly — Forty-five-minute subscription and re-cluster audit. Unsubscribe from anything unopened in 90 days. Re-run the phase 1 clustering against the latest 30 days. Update the audit doc if the mix has shifted meaningfully.
  • On model upgrades — When {your AI assistant} releases a new model and you switch, re-run the phase 3 audit (30 emails) before trusting it on production traffic.
  • On role change — A new job, a new project, or a meaningful change in who you correspond with re-baselines the inbox entirely. Re-run phase 1 from scratch — the categories from the previous role rarely transfer.