Agentic Engineering for All: Finding Repeatable Work in Any Job

Reading Time: 8 minutes

For a long time, my use of AI to help with my newsletter looked like asking it questions when something occurred to me. Give me a summary of this article. Give me 5 titles based on the content. Generate the SEO meta description. If I let it draft anything the result always sounded like AI, so I’d rewrite it. It wasn’t helping me get started or finished faster.

The unlock wasn’t a better prompt or a smarter model. It was treating the work as a system, finding the parts that actually repeated, and only using AI on those. The writing and thinking is still mine, but the scaffolding around it is automated.

What changed when I treated the work as a system was that I stopped asking AI to do everything and started asking it to do specific things. Sort this batch of links by topic, pull themes that show up across them, give me a few thought starters for openers. Each of those is a small, well-defined task with a checkable output. All geared toward getting me started faster with information in-hand. The writing still happens when I sit down with a clear set of inputs and think about what I actually want to say.

That distinction turns out to matter for almost any work that isn’t strictly assembly-line. Most of your job isn’t repeatable. But the scaffolding around the judgment parts almost always is. The trick is learning to see it.

Where repeatability hides

The interesting thing about repeatability is that it doesn’t usually live where people look for it. When someone asks “is my job automatable,” they’re picturing the whole job. Whole jobs are rarely repeatable. They’re a mix of judgment, context, taste, and a bunch of supporting work that gets dismissed as “the stuff around the job.”

That stuff is exactly where we should be looking.

Think about what surrounds the tasks that actually requires your knowledge. There’s intake, where information comes in from email, Slack, tickets, meetings, files. There’s organizing, where you sort that information into something usable. There’s summarizing, where you compress it. Classifying, where you decide what kind of thing it is. Formatting, where you put it into a shape someone else can consume. QA, where you check it against a rubric. Packaging and handoff, where you move it on to the next person or system. Follow-up, where you make sure it landed.

Most of those steps repeat. They might repeat differently across roles, but inside any given role they tend to look pretty similar week to week. A weekly report uses the same structure every time. A meeting recap follows the same shape. A project brief has a known set of fields. A hotlist pulled from inbound email, Slack, and Jira tickets has a predictable intake-and-sort pattern even if the contents change.

The work that actually needs you sits in the middle of all that scaffolding. The scaffolding is where the leverage is.

Why small units matter

There’s a reason this scaffolding work is the right place to start with AI, and it has to do with how language models actually work.

LLMs are probabilistic. When you give one a task, it isn’t following a deterministic procedure the way a script or a spreadsheet formula would. It’s generating output one token at a time, picking each next token based on probability. That works fine for narrow tasks where the space of acceptable outputs is small. It works less fine for big tasks where the model has to make dozens of silent decisions along the way and each one compounds the chance of drift.

When I ask AI to “write the newsletter,” I’m asking it to make hundreds of small choices: tone, structure, what to include, how to weight things, what voice to use, how to open, how to close. Every one of those could vary slightly based on weighted odds. By the time it’s done, the cumulative drift from what I actually wanted could be huge. That’s why those drafts always read like AI wrote them. They were never going to read any other way.

When I ask AI to sort thirty links into five topic clusters, the space of acceptable outputs is small enough that the probabilistic nature of the model becomes a feature rather than a liability. This is also where LLMs differ from the automation tools we’ve had before. Traditional automation needs structured, predictable input to do anything useful. Hand it a folder of messy notes, half-formed ideas, and links from five different sources and it falls over. An LLM is genuinely good at the fuzzy-judgment work of taking unstructured input and giving it shape. That’s what unlocks scaffolding work that used to be unautomatable. Chris Lema makes a similar point in a recent piece arguing that AI agents earn their keep in three narrow roles: generation, fuzzy judgment, and extraction. Everything else, he says, should probably be code.

The shorter and clearer the task you hand it, the more you stack the odds in your favor.

A tool for finding the repeatable parts

If small, well-defined units are where AI pays off, the practical question is how you find them in your own work. The simplest tool I’ve used is a 2×2.

Plot any task in your work on two axes. The horizontal axis is repeatability: does this thing happen often, or is it a one-off? The vertical axis is output structure: does this task have a defined output you’d recognize, or is the output different every time?

The four quadrants give you four different answers about how to use AI on the task.

The top-right is where you build first. High repeatability, high output structure. Sorting links into topic clusters lands here. So does a meeting recap, a hotlist intake from email and Slack and Jira, the outline portion of a project brief. The output is well-defined enough to check, and the task happens often enough that any time you save compounds. These are the tasks where the investment in a repeatable system pays back the fastest.

The top-left is workflow with strong review. The output is structured but the task doesn’t repeat that often, so the speed gain is smaller and the review burden is relatively higher. Worth doing, but not where you start.

The bottom-right is where you prompt or assist. The task repeats, but each instance varies enough that you’re not building toward a fixed output. A headline brainstorm. A first pass at a tagline. Open-ended ideation. AI is a useful thought partner here, but you’re not running a system. You’re having a better-quality conversation than you’d have alone.

The bottom-left is where you keep humans in charge. Strategic framing for a new piece of work. Original creative concepting. These don’t repeat in a useful way, the output isn’t structured enough to verify, and the value is in the judgment itself. Throwing AI at this corner is how you end up with generic output you’d rewrite from scratch anyway.

Most of what gets called “AI didn’t work for me” is someone trying to use AI on a bottom-left task. Most of the leverage is in the top-right, and it’s usually hiding in plain sight.

Breaking it into smaller pieces

The 2×2 finds candidates. The next question is what to do with them. Picking a top-right task and handing it to AI as a single unit usually still produces mediocre output, because even a “narrow” task often has more decisions inside it than you realize. The actual work is taking the candidate and breaking it into smaller pieces, each with clear inputs and a checkable output, until each piece is small enough that AI can handle it reliably.

A small example. I have a skill that polishes a draft in my writing voice. It works because the skill has one job. Take a draft, return a version that sounds like me. The input is defined: the draft. The output is defined: the polished version. The check is fast: does this match my voice. If I tried to give that same skill a broader job, like “write me a piece in my voice from these notes,” it would fall apart. The narrow shape is what makes it work.

Bigger workflows are just chains of small pieces like that, handed off to each other. This post is an example. It came out of a content pipeline I’ve been building, called Kessel Run (what good is an AI product without a sci-fi property name?) where each stage of the writing process is its own skill: researching and anchoring the idea, picking a hook, mapping the structure, drafting the sections, personalizing the voice, polishing the result. Each stage has a defined output with human review and input before the next one starts. None of the stages tries to write the whole post on its own, and that’s exactly why the system works.

The pattern is the same whether you’re building one skill or a whole workflow. Take the task. Find the seams where it naturally breaks. Make each piece small enough that you’d recognize a good output when you saw it. That’s how you turn a candidate from the 2×2 into something that actually saves you time.

Verifiable outputs are how the system improves

Doing the breakdown well also unlocks something the original task never could. Reviewability has a second job beyond catching errors. It’s where the system improves over time.

Think about the difference between two questions. “Did this newsletter issue work?” is hard to answer fast. You might know in a week from open rates or replies, you might know in a month from subscriber growth, you might never really know. “Did the link sorter put this Anthropic post in the right cluster?” is answerable in five seconds. The first question is too big to give you useful feedback. The second one closes the loop fast enough that you can fix the system the next time you run it.

When the units are small and the outputs are checkable, every run is a data point. You start to see which prompts hold up across inputs, which steps need a tighter rubric, which handoffs leak context. You can adjust. The system gets better.

Start with the simplest thing that works

There’s a reasonable objection to all of this, which is that it sounds like workflow design with extra steps. Engineers and good PMs have always broken work down. What’s different now is that each of those repeatable units can be partly assisted or automated in ways it couldn’t be before, which raises the payoff for doing the breakdown carefully in the first place.

The other reasonable objection is that nobody has time to map out every task this way. They don’t have to. The 2×2 is how you avoid over-engineering. You apply this thinking to the work that recurs, not the work that happens once. One-offs will always be one-offs despite how much us developers think we can automate them.

There’s a rough ladder for how AI gets used on a piece of work, from least to most coordinated: prompt, skill, workflow, agent. A prompt is a one-off ask. A skill packages a repeatable procedure with a defined output. A workflow chains skills together with handoffs between them. An agent adds autonomous decision-making and tool use on top. Each rung up adds power and adds failure modes. Agents are genuinely useful when the work underneath them has been broken into repeatable units with clear inputs, defined outputs, and a way to keep them on the rails. They fall over when they’re asked to do too much without enough guidance. That’s how you end up with an agent finding your production credentials and wiping the database to fix a bug. The complexity should match the work, not run ahead of it. Start with the simplest thing that holds up, and only climb when the work actually demands it.

What this gives back

The point of any of this isn’t to make AI do more of your job. The point is to clear out enough of the supporting work that you actually get to the parts that need you, faster and with a clearer head.

When I started using AI on the scaffolding around my newsletter instead of trying to use it on the writing, I noticed something I hadn’t expected. I started writing more. Not because the AI was helping me write, but because the friction between “I have something to say” and “I’m sitting down to say it” had dropped. The links were already sorted. The themes were already surfaced. The candidates were already shortlisted. By the time I got to the page, the work that’s actually mine was the only thing left to do.

That’s the trade. You spend some time finding the repeatable parts and building small systems around them. In return, you get back the time and attention you used to lose to setup, organizing, formatting, and packaging. The judgment work, the writing, the strategy, the actual thinking, those parts stay with you. They get more of your attention, not less.

If you want to try this on your own work, here’s the smallest possible experiment. Pick one task you do every week. Run it through the 2×2. Identify one piece of it that lives in the top-right corner. That’s your first candidate. Break it into smaller pieces with checkable outputs and build a small system around it. You don’t need an agent. You don’t need a pipeline. You just need to see the repeatable part clearly enough to give it shape.