I get it — I understand the appeal of handing over some chunk of your work (or home administrative tasks that feel like work) to some autonomous LLM that can handle them for you. The real question here is how did we get to a point where this is even necessary — why are there so many tasks that feel like overhead?
Enshittification and the profit motive
My partner and I were recently talking about the way in which booking holidays and flights has changed radically from when we were kids. We both remember going to the local travel agent (remember them?!) with our parents, sitting down and discussing plans and what we wanted and then getting a quote for a holiday. Flights, accommodation, traveller’s cheques (shocked face from my younger millennial readers), all from one place. You definitely paid more for the experience than you would have done ringing up the airline yourself, booking with the local hotels etc, but it wasn’t even as if we considered doing that — the convenience you had through the travel agents made it a no-brainer. This is not to say that there weren’t some unscrupulous travel agents out there who charged a fortune for things, but on the whole, the market worked, and it felt like value for money. The key point was, you went and spoke to a human being who did their best to understand what you needed, and to provide it to you such that you had a nice holiday.
In the late 90s, we started to see the rise of the budget airline. These guys embraced the dot-com revolution, and set up websites that allowed you to book directly with them, rather than through the travel agents or by ringing up the airline (even then, young people weren’t so keen on phone calls), and most importantly they passed the savings of not having a middle man on to you. The problem is, that juicy margin in the middle is always tempting, and soon these companies started organising their websites like minefields. Saving money on flights went from being a simple thing of using the website front end to tiptoeing through unclear menus, having to uncheck boxes for extra insurance, and generally avoiding the “dark patterns” that were placed at every turn to catch you out. Michael O’Leary himself seemed to take pride in the hoop-jumping customers of Ryanair had to do to book their flights, and in general a lack of price transparency abounded (leading to the appearance of aggregator websites like Skyscanner, Expedia and others — you can guess how the loop goes here…).
As a result of the pivot to online, things like phone lines that allowed you to book went away — instead of the minor inconvenience of phoning a number to get things cheap (something that most people in the late 80s and early 90s still didn’t do because it was a pain), we now have the major inconvenience of 12 pages of dark patterns to wade through. The user experience went from “pay more for a nice extra service” to “tiptoe through the minefield and hope you got it right” — an altogether less pleasant one.
Putting users second
For a couple of years, I worked for a company called Kare Knowledgeware, who got acquired by Dialpad, the customer experience contact centre company. I worked as the machine learning engineer who was designing, building and deploying our neural search engine, back when those were the real bleeding edge in 2018.
Our purpose was to allow people to find information they needed quickly and resolve their own issues fast, only escalating to human intervention when it was needed. Our CTO was very clear on where we played — our aim was to help with the repetitive tasks that customer contact agents dreaded (password resets etc), and free them up to deal with the “long tail” of complex tasks that required human intervention. I was shocked to discover that many of our competitors in the space prided themselves on a metric they called “deflection rate” — how many queries they resolved without human intervention.
Why was I shocked? Well, simply put — I realised I could write two lines of Python that would function better on “deflection rate” than any of our neural search algorithms, and it would look like this:
def respond_to_user(user_input):
return "F**k off, useless customer"
No escalation path, no human in the loop, no returning customers, no problem. Our aim was never to make the deflection rate as high as possible, it was to help our customers find information, and pass on to humans when we couldn’t do that. If we made our technology better, we would be able to go a little deeper into that “long tail”, but from business to business, the size and shape of that changes, which meant that we wouldn’t be able to quantify return on investment (ROI) precisely without first implementing the solution. I’m sure you can already see where this is going. You’ll be reassured that we never went down the dark path.
Trying to redress the balance
This over-reliance on metrics, and a focus only on the “business intelligence” (surely “business stupidity”? — Ed.) dashboard is what brought us to this point. Your ultimate aim as a corporate CEO is to increase shareholder value as much and as quickly as possible (Milton Friedman has entered the chat), so then look for the places where you can do that, preferably the easiest and highest impact ones. After all, you need those stock options to be worth something soon so you can move on to the next gig with a well-feathered nest. Hence: corporate buybacks, mass layoffs, off- and near-shoring of core teams, and the SaaS-ification of everything. Note that they are all short term, and all directly linked to the share price, but at the same time have a neutral to negative impact on the experience of customers.
How do you combat this kind of approach? My preferred approach is a form of “Blindspotting” (see also the wonderful book by the same name from Kirstin Ferguson). There are two steps to this process — the first of which is trying to see from other points of view. When you define a bunch of metrics for your organisation, it’s always tempting to do this from your own point of view — that is to say, to focus on what will make your organisation a tangible “success” over a relatively short time horizon. This is normal human behaviour — we don’t know if we’re going to be a part of an organisation in five years, so better to front-load the rewards (see also the concept of the “time value of money”). However, of course, this leads to tunnel vision.
Instead, view the problem from other points of view — helpful personas include faithful “Bob”, the company lifer, who doesn’t care one iota for your bonus, but does care that the company will continue to exist, and “Simone”, your customer who wants to be able to continue to get value from what you do. What would they think of your proposed metrics?
The second step of blindspotting is to regularly ask yourself the question “what am I not seeing?”. This is actually a re-statement of the scientific method, whose core principle is:
No scientific theory can ever be “proved” — it can only be the best you currently have that has not been disproved.
This concept sits on top of a pyramid of deep suspicion — scientists should be out to try to disprove their pet theories at every turn if they are being true to their core subject, although I’m sad to report that this happens less commonly than you would hope. The same approach is true of a really precise approach to metrics — by all means be happy that your engagement score went up, but always stop to ask “why”? If you don’t have a good answer, a causal explanation, then there’s every risk that you have created sweary-bot above.
So what’s the Agents stuff?
Well, the core problem we face is that business leaders have been too used to driving blind, convinced by faulty metrics that what they were doing was great for the business. That overconfidence has intertwined with a Dunning-Kruger problem, where business leaders are far removed from the work that their employees actually do, and so fail to understand the nuance of how they make things work. A translator just converts words from one language to another, a customer service agent just looks up the answers to people’s questions in a big database. This makes them eminently replaceable by “AI agents” — tools that can make calls to APIs and collect data to accomplish a task with a defined, but actually impossible to measure, error rate. The next step of the logic is the most egregious, and where I cast a sceptical eye at a lot of my fellow practitioners. “I saw a demo from a vendor, and the platform did a perfect job of answering my queries / translating my document (which I had to check with Google Translate) / presenting a dashboard,” goes the business leader, “so it must be possible to replace the team doing this in our company with agents.”
Because of this cognitive failure, and if we’re honest, shameless overselling by a big chunk of the industry in which I work, agents are almost always doomed to be a disappointment from the start. The reason is simple — a translator’s job isn’t as simple as “switching words”, or even “switching words accounting for cultural norms”, just as a good customer service agent knows there’s a lot more to it than just answering questions. Human beings adapt to their environment by reading contextual clues that other human beings give out — we’re able to read the room. Any machine learning solution that exists today, including any agent system, struggles to do this meaningfully, which means that we have to monitor them constantly and make regular updates and changes to their configuration as more and more information comes in about how the job we’re trying to “replace” is harder than we think.
All that being said, if we’re careful and we scope well what the role is that an agent actually needs to do, rather than making it a human replacement, they can be useful for specific things.
Okay, and OpenClaw?
The interesting change of pace recently has been that people are increasingly finding the digital world overwhelming, and looking at taking this same “agentic” approach to their own personal lives. One of the memorable stories that came out in the early days of OpenClaw was of a non-technical person trying to use it to organise the photos on their desktop, only to watch the agent delete everything. Somewhat astonishingly, despite this experience, they went on to sing the praises of the project. I recently heard a podcaster refer to this as like “replacing your gas boiler with a nuclear reactor — in principle not a bad idea, but ridiculously dangerous for a normie” (I’m paraphrasing).
So what is it that is driving people to take this risk? Digital overwhelm. A recent experience comes to mind. In my previous job, I needed to join an ELT meeting at the last minute when I was away on holiday. Since I didn’t know this was going to be important when I left, I didn’t take my work computer, only my personal one. Turns out it’s not so easy to join Teams meetings from your personal machine when your outsourced IT provider has set everything to UK-only access, but that’s a different matter.
So, what did I do? I signed up for a personal Teams account, paid my 4.15 CHF, and tried to join the meeting. It didn’t work, and moreover it turned out that I had ended up in a netherworld where it was impossible to sign into the monthly-billed account I had signed into. It took multiple rounds of trying and failing to get through various automated systems and several recorded letters sent to the Swiss Microsoft headquarters before I finally got my confirmation of account cancellation this week. I signed up in July 2024…
I don’t mind admitting that the idea of having an agent deal with this for me is tempting, and I would probably even be tempted to look the other way on the glaring security issues that all personally-deployed agents have to get it done. I don’t consider myself an idiot, just digitally overwhelmed by how enshittified processes have become. The problem with this of course is that there’s no guarantee that your agent and your awful supplier won’t collude to cost you more money in the long run, as AI platforms also enshittify.
The 💩 is all around us. When you’re tempted to give in to the need for a personal agent to handle a workflow, remember how it used to be booking a holiday — you paid a bit more for someone to care. Now it feels like you pay a lot more for someone to fleece you anyway — so perhaps the problem isn’t even the technology, it’s the late stage capitalist incentives we have for number to go up.
I don’t really know how to end this article other than to say that all of the above isn’t why I got into tech. I do believe that technology platforms should be here to serve rather than to control, that they should get customers because they are good, not because they are the only game in town. Somehow, I get the impression that this point of view is almost quaint nowadays. The problem with the existing approach of treating customers like fools is that eventually human beings get tired of the abuse. I hope that one day my naive view will return to being the dominant one…
This post also appears on my Substack.