The hackathon that started it all

In February 2026 we ran an internal hackathon. The brief was open: pick your own project and build something with AI that would be useful to the business. I picked the merchant-risk problem, which had been on my mind for the best part of a year, and some of the data scientists and fraud analysts I had been working on it with picked it too.

I was Chief Risk Officer at the time, and not anyone’s idea of a developer. My role had drifted into strategy and leadership over the previous fifteen years, and whatever technical skills I once had were properly out of date. I had not written more than a basic SQL script in five years and had never retrained out of SAS into Python, so when I say “we built a pipeline” I am very deliberately not claiming I personally typed it; a small group of us got it working by talking to a coding assistant and iterating against the outputs as they came back.

By the end of the day we had a pipeline that could read merchant websites at scale and score them for fraud indicators, and we had pointed it at enough of the portfolio that afternoon to be reasonably confident it would generalise to thousands of merchants without any architectural change. Not a line of it had been written by hand.

The problem itself was not new. Merchant-risk analysts manually review websites as part of onboarding and ongoing monitoring, and a good analyst can read a website and form a judgement by working through a fairly intuitive checklist. Is this a real business? Does the catalogue make sense? Are the compliance pages real or boilerplate? Does the contact address look like a corporate domain or a free email provider? Skilled work, and it really does not scale; a book of tens of thousands of merchants needs a model rather than a person.

We had already done the obvious things, by which I mean transactional signals, domain intelligence and external risk scores from the usual vendors. These are useful but they miss the thing a human reviewer catches, which is that the website itself is often the tell. A merchant with a Shopify hosted site, three SKUs, a Gmail contact address and a “terms and conditions” page copied verbatim from a template is rarely a £5m-a-year fashion retailer.

What we wanted was the analyst’s judgement, automated and applied to the whole book. The hackathon was the excuse to find out whether we could do it in a day.

The five fraud signals AutoAgent extracted from each merchant website: catalogue size, generic contact email, compliance pages, site depth and content, and custom build versus template site.

What we built

Two pieces, stitched together.

The first piece converted a merchant’s website into something a language model could read, and we used Jina AI’s reader API for this. You give it a URL and it returns clean markdown without the navigation, cookie banners and ads, which is cheap, fast and good enough for fraud analysis, and we made one call per merchant homepage and a few common internal paths (/about, /contact, /terms, /privacy).

The second piece was Claude reading that markdown with a structured prompt, and honestly the prompt was the bit that mattered most. We described what a merchant-risk analyst actually looks for and asked for specific, bounded outputs: catalogue size band, contact email type, compliance pages present, site depth, template versus custom build, and a handful of others. Nothing creative, nothing open-ended; we had taken the fraud analysts’ own mental checklist and forced it into a structured output we could rank.

We ran it as a small Python loop with one worker per website, rate-limited against Jina and running in parallel across the list of domains. By the end of the afternoon we had run it across enough of the portfolio to be confident it would scale to the full book of tens of thousands of merchants without any architectural change.

The output was a table, one row per merchant and one column per signal.

Why it mattered to me

Two things.

First, the output was useful on its own, not as a final risk score but as a shortlist of merchants whose websites had the analyst-visible tells of something not being right, which a human could then triage much faster than on a fresh website review.

Second, and this is the bit that has stayed with me, we built it together in a day and none of us had walked in that morning planning to write code. The work that would have taken an engineering team several weeks we did in an afternoon. Three things that usually happen on an engineering project did not happen during that afternoon: there was no translation layer between the subject-matter experts and the builders because they were the same people in the same room, there was no specification for the build to drift away from because there was no specification, and there was no “we built what you asked for, not what you wanted” moment three sprints later because the loop from idea to working artefact ran in minutes rather than weeks.

I do not want to overstate any of this; the code itself was rough, scripts on a laptop, nowhere near production-ready, and engineers would have needed to harden it properly before it went anywhere near regulated infrastructure. But the insight, the shape of the feature set and the proof that the idea worked were all there at the end of the day, and the scripts were just the artefact; the understanding they produced was the real output.

What it made me change

I do not think I would have spent the next two months building everything else on this site without that day. Before the hackathon I had thought of AI as a useful tool for individual tasks; after it I started thinking of AI as something you build a system around, and the binding constraint on what I could personally build was no longer my willingness to specify the work but my willingness to build the operating environment around the work, by which I mean the memory, the context, the skills, the hooks and the safety. A lot of what I have written here since traces back to that day in some way.

If you want the broader position on what AI-native means in practice, there is a separate post on what it means to be AI-native. This one is just where it started.

11 February 2026. One day, one pipeline, capable of scaling to the whole merchant book. The workplace and tool are both kept generic; the date is not.