Tim Trailor
Essay

"Email me when done": a persistent task runner with a delivery guarantee

Long-running tasks fail silently if the session dies before the result is ready. This is the runner I built to make "email me when done" actually mean that, plus the retry loop, fallback paths, and last-ditch file write that back it up.

One night I left the laptop running a long task, closed the lid, and assumed the agent would email me when it finished. In the morning there was no email. The task had probably completed, but the session had died somewhere in the middle and the completion message was sent to a dead terminal.

A session cannot keep an “I will email you” promise on its own, because the email has to be sent by something that outlives the session.

The result is a skill called /autonomous and a small Python script called autonomous_runner.py.

The trigger

The skill triggers when I say something that implies I am not going to watch the session to the end: “email me when done”, “ping me”, “let me know”, “I am stepping away”, “going to bed”, “back later”, “logging off”. The skill is also invocable as /autonomous.

When it triggers, the current session does three things:

  • Rewrites the request into a self-contained prompt that does not depend on any in-session context.
  • Launches autonomous_runner.py as a detached process via nohup.
  • Tells me the runner is launched and where the email will go.

The current session is then free to die, be cancelled, or be closed.

The rewritten prompt matters because the runner’s Claude invocation is a fresh session with no memory of the current one. Everything the task needs has to be in the prompt. This sometimes takes care (a task based on state only in the current session has to serialise that state first), but it produces a task description that is portable, testable and re-runnable.

The retry loop

The runner runs claude -p "prompt" in a non-interactive subprocess. Each attempt is a fresh Claude session: same tools, same CLAUDE.md, same memory, no shared state.

If the attempt fails (non-zero exit, timeout, or error marker) the runner retries. Backoff is 30s, 60s, 120s, 240s, 300s, default five retries. Between retries the runner logs the attempt output for later review.

Each retry is fresh by design. Not “continue where you left off” but “try again from scratch.” Transient failures clear on retry. Deterministic errors (broken prompt, missing dependency, credential issue) fail identically every time, so the runner exhausts its retries and moves to the failure path.

Fresh-session-per-retry is also a security property. A retry cannot inherit compromised state from a previous one, so if the first attempt was tricked by prompt injection from fetched content, the second starts clean.

Success and failure paths

On success, the runner pulls the final output and emails it to me. The body contains the actual result, not a status message. “Task complete” is not a deliverable; the summary of what was done, with the specific outputs, is.

The email is sent via Simple Mail Transfer Protocol (SMTP) using a Gmail app password from the Mac Mini’s credentials file. The subject encodes task name and outcome.

If all retries are exhausted, the runner emails a failure report containing the prompt submitted, every attempt’s output (truncated if excessive), and the runner’s own log. The failure email is as informative as the success email: I should be able to read one email and know what happened.

If the SMTP send itself fails (wrong password, network down, Google rate-limiting) the runner retries delivery up to five times with its own backoff. If email keeps failing, the runner writes the result to /tmp/autonomous_result.txt. The cost of “we thought the email went and it did not” is higher than the cost of a file that is never read.

If the runner itself crashes (out of memory, disk full, bug in the runner code) it tries a crash notification email: a few lines of traceback plus a pointer to the log.

Why a detached process

I considered building this inside the session with a persistent in-session daemon, and rejected the idea. A session running the runner is still a session, and a session can be killed by the OS, by running out of resources, by being cancelled, or by the terminal being closed.

A detached nohup process is not a session. It has its own process identifier, lifecycle, and logging, and inherits none of the session’s fragility.

The cost is no interactive ergonomics. The runner cannot ask me questions mid-task or adapt to mid-flight feedback. It is fire-and-forget, which is what the trigger (“I am not going to watch this finish”) requires.

The rules when running autonomously

The Claude invocation the runner spawns is told, via its system prompt, that it is running in autonomous mode. The rules it follows are encoded in the memory system’s operational rules:

  • Never block on a non-critical question. If a decision has to be made, take the safer, simpler, more reversible option and document why.
  • Only block on truly irreversible actions: pushing to public repositories, sending messages to third parties, deleting data, sending commands to a printer during a live print.
  • If the task hits a dead end, try a different approach. If the different approach also fails, include that fact in the email. Do not give up silently.
  • Always send the email, even on failure. The email is the deliverable.
  • Document every autonomous decision in the email or in a memory update.

These rules exist because blocking on me for confirmation when I cannot respond is the failure mode that made the runner necessary. Asking a question I cannot answer produces no deliverable.

The runner can be bold about things that can be undone (a file write, a local command, a commit to a private repository). It cannot be bold about things that cannot (a public repository push, a message to another person, a destructive printer command).

Where the runner lives and how it is triggered

The runner code is at ~/.claude/skills/autonomous/autonomous_runner.py, the skill definition at ~/.claude/skills/autonomous/SKILL.md, the log at /tmp/autonomous_runner.log. An active-task marker at /tmp/autonomous_task_active contains the process identifier and start time so I can inspect what is currently running.

The typical invocation is a one-line bash call inside a Claude Code session:

nohup python3 ~/.claude/skills/autonomous/autonomous_runner.py \
  --prompt "SELF-CONTAINED PROMPT" \
  --email "[email protected]" \
  --max-retries 5 \
  --timeout 600 \
  > /tmp/autonomous_runner_stdout.log 2>&1 &

Design decisions:

  • --timeout 600 is per-attempt, not total. Total wall time is up to max-retries * timeout + backoff-total. In practice tasks complete on the first or second retry, or fail cleanly.
  • The runner strips ANTHROPIC_API_KEY from the environment before spawning Claude, so the invocation uses subscription auth rather than paid API credits.
  • SMTP credentials are loaded from the Mac Mini’s credentials.py at runtime: not hard-coded, not passed as arguments, not logged.

What it has actually caught

The runner has been live for a few weeks. Real value has shown up in three situations.

First, the overnight case that motivated it. I now routinely leave long-running tasks to complete while I sleep, and the emails arrive.

Second, mid-task network drops. A task pulling a large data set from an external API timed out twice on its first invocation. The runner retried, the second attempt succeeded on a later backoff step, and the email arrived with the completed data set.

Third, a task that was genuinely impossible. I had asked the runner to do something requiring credentials I had not provisioned. It hit authentication failure on every attempt, exhausted its retries, and sent a clear failure email. I fixed the credentials and re-ran.

The email that never arrives is the failure mode I most wanted to eliminate, and the runner has eliminated it.

What it does not do

Interactive tasks. Anything requiring me to answer questions or approve intermediate steps is the wrong fit; use a session for those.

Long-horizon planning. The runner runs a single Claude invocation per attempt, so anything requiring multi-step orchestration across hours is better expressed as a proper pipeline with its own scheduling.

Replacement for daemons. Recurring work (health checks, backups, monitoring) runs as LaunchAgents or cron jobs. The runner is ad-hoc execution.

What the runner treats as the deliverable

For this runner the email is the deliverable, not the computation. That is why it retries the email itself rather than just the task, writes a fallback file when email fails, and sends a crash notification when the runner itself dies. Each layer exists because, at some point, completion-without-delivery happened.

The runner will be in the eventual control-plane repo. Contact form on the about page in the meantime.