AI is kind of a big deal, we can all agree. But how revolutionary is it, really?
Discussions about AI often lie in the extremes – from utopian promises to doomsday scenarios. But let’s face it: we’re famously terrible at predicting the future, and there are plenty of pressing questions to tackle in the present.
At our inaugural AI customer service summit, Pioneer, we invited the renowned tech analyst Benedict Evans to separate the signal from the noise.
“Where do our predictive patterns fall down when faced with a technology as genuinely novel as LLMs?”
Evans takes us on a journey through the technological transformations of the past and how the biggest changes often come in unexpected ways.
He explores some big questions: How can we identify the artificial intelligence reality amid all the artificially inflated hype? How does the arrival of AI echo historical technological innovations, and how does it differ? Where do our predictive patterns fall down when faced with a technology as genuinely novel as LLMs?
After his presentation, Evans sits down with Intercom’s Co-founder and Chief Strategy Officer, Des Traynor, for a live Q&A, digging deeper into the question on everyone’s mind: what can AI actually deliver right now, and what’s next?
What follows is a lightly edited transcript of the episode.
Changing the work to fit the tool
Benedict Evans: I’m always interested to see how companies run events. If this were an American company, it would be, “Please put your hands together for world-renowned influencer and AI expert Benedict Evans.” But since it’s an Irish company, it’s, “Here’s Benedict. He’s an analyst”. I am, of course, a world-renowned influencer and expert! I should just state that for the record.
So, Des asked me to talk about AI and customer service, and I thought a good way to start talking about that would be to go back in time a little bit.
This is Jack Lemmon in 1960 in a movie called The Apartment, and he works as a clerk in an insurance company in a building in midtown Manhattan. He has a typewriter and a Rolodex and a thing called an electromechanical adding machine. And so, he gets given a calculation, he does the calculation on the machine, he puts it in a typewriter, he types it up, he puts it in an internal mail envelope – I’m actually old enough that I know what an internal mail envelope is; all the younger people in the room ask your older colleagues – and then it goes off to somebody else.
“To begin with, you make the thing fit the new tool, and then the new tool means that you change how you do your job”
Everybody that you see here is basically a cell in a spreadsheet, and the whole building is a giant Excel file. And once a week, someone on the top floor presses F9 and the whole building kind of recalculates from top to bottom and generates new insurance prices. Meanwhile, the love interest is Shirley McLean, who’s an elevator attendant, which means she stands in the elevator, someone says “floor seven,” and she presses a button that says seven. That’s her entire job. So, this is a romance between a cell and a spreadsheet and a button.
In 1965 or so, that company bought, or rather, rented, one of these, an IBM mainframe, and they automated away all of those jobs. They didn’t have people sitting at desks with adding machines. Instead, they used the mainframe to do that. And a couple of things happened when you did that. Firstly, you just automate everything that you are already doing, but then, over time, you change how the insurance company could run because it was possible to do new things. And there’s a classic framing when we adopt new technologies. To begin with, you make the tool fit the way you work, and then, over time, you change the work to fit the tool.
So, every now and then, in big companies, there’s somebody who downloads a bunch of data from their internal system, some horrible clunky legacy system like Salesforce, they get a CSV, put the CSV into Excel and make charts, put the charts into PowerPoint, make slides and email it around the company. And then somebody says, “Hey, maybe you should do this in Google Slides, and then you wouldn’t be emailing a file around.” And then somebody else says, “Yeah, but maybe our management system should just make the charts and maybe there should be an AI system that should just tell us what’s changed.” To begin with, you make the thing fit the new tool, and then the new tool means that you change how you do your job.
Over the last half-century or so, since that mainframe, every 15 years or so, we’ve gone through one of these platform shifts that changes how we do our work, changes what the tools are, changes what tools can be built, and changes what kinds of companies and what kinds of products we all use.
From the mid-1960s to the late 1970s, that was mainframes. Then, from the late seventies through to the mid-nineties, it was the PC, and then it was the web, and then smartphones. And now, we have another shift, almost certainly, due to generative AI.
As you go through each of those, you can look at them at a kind of reductive level. Well, it’s a mainframe. It’s client server, it’s PCs and Oracle. It’s machine learning. But when you go through those, you change what kind of work you can do and what those things could be doing. Mainframes were doing record keeping. They were basically digital filing cabinets and they did data processing and they could generate your taxes. But when we went to PCs and SQL and databases, you could actually ask questions about your business. You could call it information management, business information management, or business intelligence. And now, as we go to machine learning, what should we think about in terms of what this is, what it’s going to be doing, and how it’s going to be changing how we do business?
What is this thing good for?
The one thing that everyone in tech does agree is that this is kind of a big deal. This is a quote from Bill Gates from 18 months ago:
“In my lifetime, I’ve seen two demonstrations of tech that struck me as revolutionary: the GUI and ChatGPT.”
Bill Gates, March 2023
He said, in his career, he’d seen two revolutionary demos, the graphical user interface and ChatGPT. And it’s interesting that he doesn’t mention the internet, PCs, smartphones or iPhone, or any of those things. It’s the GUI and ChatGPT. The GUI was a step change in who could use a computer and what you could do with software because you didn’t need to learn keyboard commands and type them in anymore. You could just click on what you wanted. This was an enormous change in what could be done with software and how many people could use it. We have something similar with large language models. Again, a fundamental change in what you can do with software and who can use it – you can just ask the computer to do something for you.
This is a young Sam Altman raising a deal that’s now closed of 6.6 billion at $150 billion-plus, which is the largest venture capital round ever. And he’s probably going to have to raise another 50 or a hundred in the next year or two to get him through to profitability, presuming that happens.
On the other hand, when you look outside of the tech industry, everyone thinks it’s very interesting, but a lot of people aren’t entirely sure what we are supposed to be doing with this. This is the CIO of Chevron, this summer, saying “The jury is still out on whether Copilot is useful enough to justify the cost.”
Why exactly should I double my Microsoft spend on 200,000 desktops? Why exactly should I give the cast and crew of The Office at Dunder Mifflin in Slough a copy of Copilot and tell them to use that instead of using SAP? What is it exactly that I’m supposed to be doing with this stuff?
I think we’ve got this kind of interesting moment now where this is extremely exciting and a huge technology breakthrough, but we’re kind of trying to work out what we do with that.
I think a good place to understand this is to go back to look at the last wave of machine learning about a decade ago when image recognition starts working. I would show slides like this to big companies and they would say, “Well done. That’s very clever. We’re happy for you. But we are a bank. We don’t have pictures of children petting dogs. In fact, we don’t have any pictures. We’re a large insurance company. We don’t have photographs. So what is it that this is useful for? Why are we interested? Why do we care?”
And it took us a while to work out that the right level of abstraction to understand machine learning was pattern recognition. And once you start thinking that this is pattern recognition, you can start thinking about what kind of patterns you might have and what kind of things you might be able to turn into patterns.
“One of the ways I talk about this at a very conceptual level is that AI gives you infinite interns”
And so, for your bank, again, fraud detection is a pattern recognition problem. Customer service becomes a pattern recognition problem. We spent almost the last 10 years founding giant companies that turned problems into pattern recognition and unbundled them out of SAP or Excel. And indeed, some of them were turning problems into image recognition and working out that this is a problem that we can now solve with this new technology.
One of the ways I talk about this at a very conceptual level is that AI gives you infinite interns. So, you would like to listen to every call coming into the call center and tell me if the customer sounds nervous or if the customer agent is being rude. You don’t need an expert to do that. You could get a 15-year-old to do that. In fact, you could probably get a dog to do that. One of my colleagues in the valley used to say that AI will do anything you could train a dog to do. The advantage of this metaphor being that you can never actually be a hundred percent sure what you’ve trained the dog to do. The dog is generally doing what you’ve told it, but that may not be what you think you told it, which is kind of a common problem with AI systems. They’re doing what you told them, which is not necessarily what you think you told them.
And so, we have this whole class of things that you could not automate but any kind of mammal brain could do. And machine learning let us automate a very broad class of that. We now have the same conversation with generative AI. Again, we have very clever demos and it’s amazing, but we’re not quite sure what it’s for.
This is a photograph I took in an antique shop in Manhattan. I’m kind of a plane geek, and I couldn’t work out what aircraft it was. And here it is. I didn’t have a photograph of the actual aircraft. It’s a photograph of a model, and it’s never seen that model before, but it still worked out precisely which aircraft it was. It’s an early model of the Tu-22 that has a differently shaped engine intake. This is all very exciting. We’ve all seen demos of ChatGPT where you think, “Holy fucking shit, how did it do that? Computers can’t do that.”
Where do we go from here?
The base question to think about here is, well, how good exactly is this stuff going to get? Because two years ago, it didn’t really work or it was only good for crap demos, and it’s suddenly become very, very good by giving it huge amounts of data, huge amounts of compute, and, of course, huge amounts of money. So, it scaled and it got much better. The foundational question in the tech industry right now is: Will the scaling keep happening? We don’t have any theoretical model to tell us why large language models work so well. We don’t have a theoretical model that would tell us what would happen if we make them 10x bigger or 100x bigger. We also, incidentally, don’t have a Hewitt theoretical model for what human intelligence is, so we don’t really know if they’re heading towards that either.
“It’s not really reasoning, but it can often look like reasoning. And it lets us automate a very broad class of things that we couldn’t automate before”
What we do know is that, so far, it seems to have worked. There are two views here at a very reductive level. There’s one view that says the scaling is going to slow down. The laws of physics are going to kick in. What we’ve got now is more or less what we’re going to have. Maybe a bit better, maybe 10x better, but not a 100 or a 1000x better. And so, we should sort of be working on the basis of what we see today. But there are other people who say, “No, this stuff is going to scale a lot. It’s going to get a hundred, a thousand times better.” And in that case, these things might be able to do the whole thing that you use 50 different pieces of software to do today.
Now, the fun part is nobody knows. There’s a line in The Hitchhiker’s Guide to the Galaxy about somebody who spends a year dead for tax purposes, and you might just spend a year dead and come back and find out what happens. Microsoft’s CTO is saying it has scaled so far, and Sergey Brin is saying that doesn’t necessarily mean it’s going to scale another 1000x.
We are sort of looking at this at the moment and thinking, “What do we do with this?” There’s a very reductive answer, which is to say it predicts the next token, which is yes, very well done, very clever. And there’s a more useful, intermediate way of thinking about this, which is to say this does some kind of synthesis and summary and it does stuff that kind of looks like reasoning. It’s not really reasoning, but it can often look like reasoning. And it lets us automate a very broad class of things that we couldn’t automate before. Not everything, but it lets us automate stuff in the same way machine learning let us automate a broad class of things we couldn’t before.
There are some people who look at this and think, “This is amazing. I have to have this. This changes my life.” And there are other people who look at it and are not quite sure what to do with it.
VisiCalc was the first successful software spreadsheet. And as you can see from the outfit, this was the late 1970s. What you can’t see from the outfit is that that setup costs something over $10,000 adjusted for inflation. The fucking floppy drive was a thousand dollars, and a printer was even more. At this point, spreadsheets were something that you did on paper. You did a spreadsheet with a piece of grid paper, a pencil, and maybe a calculator. Dan Bricklin creates VisiCalc, shows it to accountants, and accountants look at it and think this thing does about two weeks of work in 20 minutes. You change the interest rate here and all the other numbers on the spreadsheet change. Today, you think “…yes?”, but if you were an accountant in 1980, that was a week or months of work because you had to do all of those numbers one by one by hand. If you saw this and you were an accountant, you’d have to have it. But if you saw it and you were a lawyer, you would think, “Well, that’s great, and my accountant should see this, but I don’t do spreadsheets.”
Today, if you are a software developer, you look at this and think it is instantly useful and you have to have it. People in the software industry, in big companies, are already seeing 20, 30, 40% increase in efficiency in software development from using AI code assistance. Something similar in marketing – a huge increase in efficiency in the speed that you can produce first graphs and ideas and product sketches and mockups. In both of these, the errors are easy to see and there aren’t wrong answers, which is something I’ll come back to in a moment. And of course, the other thing where this has had a huge amount of value adoption is in customer service.
The customer service conundrum
This is my favorite example of where you have to be careful. Air Canada used an LLM to build a customer support chatbot. The customer asked the chatbot what the returns policy was and the chatbot gave them a returns policy. It was a very good returns policy. Unfortunately, it wasn’t Air Canada’s returns policy. Air Canada refused to honor it, the customer went to court, and Air Canada said to the judge, “Well, the customer should have looked at another part of our website as well, not just that part of our website.” The judge did not find this convincing argument, and so they had to pay. The conceptual challenge here is that these systems look like they’re answering questions, but they’re not. They’re saying, “What would a good answer to that look like? What would people probably say? What would the average person respond for something that looks roughly like that?”
You can try this today. If you go and test these systems, the challenge is to test them on something you know a lot about. This is a study from Deloitte on people who’ve used these things. The people who’ve used these things are more likely to think that they always produce factually accurate answers. They don’t. Any scientist working on this will tell you they do not, and, by definition, cannot always produce factually accurate answers. But they look like they are. So, the test here is not to go to ChatGPT and ask it about something you don’t know anything about. The test is to ask it about something you know everything about.
“When AI works, it just disappears. We don’t see it anymore, it just becomes software”
How do we manage these things? How do we build products around them? Well, there are probably three answers to this. One of them is to make the models better. The models scale and the error rates go down. But there are also two kinds of product questions. One of them is to look for use cases that don’t have wrong answers or where the errors are easy to see, which is the marketing use case. The other is to build product around that so that you manage this and the user doesn’t see the errors. You abstract the prompt, you abstract the output, and you control how you design a product around this rather than just dumping the raw output at the user and telling them to trust it. You think about how you make things with this.
Now of course, once that works, it just disappears. This is a chart of elevator attendants in the USA.
Elevator attendant was a job, as I mentioned earlier. Then, Otis deployed something called the Autotronic elevator, the automatic elevator. And then, of course, the chart went back down again. But how many of you, in the last couple of weeks, have used an automatic elevator? When you get into the elevator and you press the button, do you say, “Ooh, it’s an automatic one, it’s an electronic elevator”? We don’t say that anymore. “It’s an electronic elevator. It’s got buttons. There isn’t a man with an accelerator and a break.” When it works, it just disappears. We don’t see it anymore, it just becomes software. And we’re kind of going through that process now as we think, well, what do we do with this?
“There’s also a more basic question, which is what is it that we’re not seeing yet? What is it that’s going to happen that we don’t predict?”
What do we build and what are we not expecting? What are we not seeing yet? And I think there are a couple of building blocks to think about in the context of customer service. Firstly, the thing everyone is looking at, chatbots. There’s a big conversation to be had around where that’s the right thing to deploy and where that’s the thing you are deploying because that’s what everybody has to deploy.
Coming more obliquely, I think it’s very interesting to think about what it means that basically any translation between any language pair will work now. You don’t have to have people in the Philippines speaking English. You can have people in Cairo who speak Dutch, and you can do that translation pair now as well. What does it mean that translation works in any customer pair? What does that do to customer service?
There’s a much more general question of knowledge management. What does it mean if you help your users find things? What does it mean if you are helping your staff find things or you are helping that flow of information within the organization and outside it? But there’s also a more basic question, which is what is it that we’re not seeing yet? What is it that’s going to happen that we don’t predict? There is a famous quote about this from a guy called Yogi Berra:
“Predictions are hard, especially about the future.”
Yogi Berra
The big changes no one predicts
I found this article from 2010 on TechCrunch a couple of days ago, announcing the launch of something called Uber Cab, which allowed very rich tech bros to get limousines. Obviously, that’s not going to have a big impact on the world. But it was kind of difficult to predict what it was going to mean when you put GPS in everybody’s phone.
I was in the mobile industry in the early 2000s and we spent a lot of time talking about location. You’d go to conferences like this and someone would say, “You’ll walk past a Starbucks and your phone will know and you’ll get sent a coupon by SMS and the operator will charge a dollar for each of those coupons for the location lookup.” And that never happened. But on the other hand, none of us realized that location was going to change what taxis were. There are the sort of easy, obvious things you can see, but all the big changes will be the things that aren’t easy to see.
There’s a framework for thinking about that, about what you can’t know. With every new technology, the first thing is always that there are new features, that there’s incremental cost saving, the incumbents always try and make it a feature, so they hold a very large event with a rockstar in San Francisco and say they are incorporating it into their product, so you don’t need to talk to anybody else. You see Microsoft and Google spraying generative AI all over their products.
“Is this a CIO question or a CEO question? Is this bottom-line innovation or is this top-line innovation?”
Then, you go a little bit further and people use this to create new products and new capabilities and new revenue lines. You do new things that you couldn’t do before this became available. And then, you get startups using it to unbundle. So startups unbundle Salesforce or Excel or Gmail. They pull one particular use case out of some big monolithic product and turn it into a new company, and then, every now and then, someone will come along and do an Uber or an Airbnb and actually redefine what the market is.
In other words, as you look at how you deploy this stuff, you can kind of ask, is this a CIO question or a CEO question? Is this bottom-line innovation or is this top-line innovation? The answer is yes to all of those depending on what use cases, what products, and what kind of companies you’re talking about.
Meanwhile, I think it’s kind of worth mentioning that all the stuff that we were excited about before ChatGPT launched is still there and it’s still happening. The tech industry is always very excited about stuff that’s going to happen in 2025 or 2030, which today means generative AI, maybe VR and AR. Some people were interested in crypto. Obviously, nobody in this room was ever interested in crypto, but other people thought crypto was a big deal. And so we think a lot about that. Meanwhile, most actual software companies today are building ideas from 2010 or 2015. Ideas like SaaS, cloud collaboration, workflow, unbundling, interconnection, and automation.
And then, the rest of the economy is being overturned by ideas from 2000. Ideas like maybe people will buy clothes on the internet. I actually was an internet analyst in 2000 and that was a really crazy dumb idea. No one’s ever going to buy clothes on the internet. They might buy books, but that’s probably not profitable. No one will buy clothes. And if we think about what’s happened since 2000, all of that stuff is still there and it’s all become really big.
It’s all about the industry
This chart of e-commerce is the most boring chart in the tech industry because it just went up one percentage point every year until we went through the pandemic, and then, of course it’s returned exactly back to the trend line that it was on before.
Slightly more dramatic, here in the UK, we are sort of back onto the trend line, but UK e-commerce, excluding grocery, is now about 40% of all retail.
And what does it mean if you are building a brand or running commercial real estate or trying to talk to a consumer if 40% of retail is online now? Even grocery doubled from 5% to 10%. And one answer is that you get completely new kinds of companies. So, Shein is now almost certainly the world’s largest apparel retailer. It’s double the size of Zara and H&M, and it’s built with a completely different logistics model, operating model – a different way of getting clothes made and taking them to the customer. Amazon, meanwhile, is now a $50 billion advertising business. It’s actually bigger than the entire global newspaper industry – not that that’s a particularly large number anymore – and it’s still growing up.
“You have a kind of fundamental change in the competitive landscape”
The TV industry is also being completely overturned. Netflix now spends more commissioning TV shows than all free to air broadcasters in Europe. Disney is spending more commissioning TV shows than all broadcasters in Europe combined. And that always kind of used to be the case except that European broadcasters could just buy TV shows from the Americans and now the Americans are coming in and competing with them directly. So you have a kind of fundamental change in the competitive landscape.
And of course, the other place that there’s a big fundamental change in the competitive landscape is in cars, where it rather looks like the Chinese car industry is going to try and do, in the next 10 years, what the Japanese car industry did in the 1980s and completely overturn all sorts of other established industries, particularly in Germany.
Now, I’m not going to actually talk about all of those because I think the interesting thing here is that as each of these changes happen, the technology industry changes how something works, but then all questions are for that industry. All the questions for Netflix are TV industry questions, and all the questions for Shein are really apparel industry questions. And I think the same thing applies to customer service. All the questions around what AI means for customer service aren’t really AI questions. They’re all customer service or retail or consumer questions. What is it that we do with this? How is it that this changes our industry?
I’m going to come back to one more quote about AI:
“‘Intelligence’ is whatever machines haven’t done yet.”
Larry Tesler, 1970
This is Larry Tesler back in 1970. Intelligence is whatever doesn’t work yet, because once it works, people say, “Well, that’s not AI, that’s your software.” Ten years ago, machine learning was AI, image recognition was AI. Now image recognition is your software. You take pictures of your kids with your phone and it recognizes them. And of course it does because that’s software. No one says that’s artificial intelligence anymore. Back in the 1970s, databases were AI. Now databases are just databases. You don’t use your bank and say, “Oh, I’m using AI now.” It’s just a database.
And I think maybe the structural way to think about this is that technology is also whatever just started working. Once it’s working, it becomes automation and then it becomes software and then it just becomes cars or media or customer service, and it’s back to that industry again. And with that, I will say thank you.
Q&A: Navigating the unknown
Des Traynor: It is a pleasure to be beside global renowned analyst and influencer, Mr Benedict Evans. Give it up for Benedict.
I enjoyed that conversation. It is a good reminder that all the cool stuff that was already happening, it’s still already happening. If anything, it’s happening in the dark. Our world is one that forces us to think about the future, especially right now, because it’s clear enough that customer service is going to get rewritten in the era of AI at some level. We’re trying to push the boundaries of what’s possible there.
First question is from Brian from CE Expert. What advice would you offer to CS folks on adapting to this sort of change in knowledge work in their career? What should people be thinking about? If you’re worried, if you’re sitting there thinking, “Shit, all this stuff is happening,” what is a good sort of first thing to do? Get familiar with it, find a new job, take up farming?
“How many of us today have the job that we would’ve had 20 years ago or 30 years ago or 10 years ago”
Benedict: There’s a slide I often use on a more general AI presentation where I say I think all AI questions have one of two answers. One answer is “It will be just like every other platform shift”. How should we buy this? Should we buy from the big company or the small company? Should we unbundle it? Is it CapEx or OpEx? Can we put it in this year’s budget or next year’s budget? How did you do cloud? How did you do databases? How did you do PCs? How did you do mobile? Well, it’s like that. And the other class of question, “Oh, no one knows.” Is it going to scale? What happens to the error rates? How many models are there going to be? And so on. So the answer to that question is, well, how many of us today have the job that we would’ve had 20 years ago or 30 years ago or 10 years ago?
Des: Probably zero I guess.
Benedict: With our grandparents, you’d get the job with the big company and then you’d stay there and get the defined benefit pension when you’re 65 and that’s it, you’re done. That’s not the world we live in anymore. I don’t think AI specifically is a different answer to this. It’s kind of like saying, “Well, you are a designer. How should you think about Macs and PCs and color printing?” Well, get curious and get pushing and think about what happens next.
Des: If I was to scratch it out a little bit, you wrote an essay about self-driving cars and this concept of second-order and third-order effects. If all cars are autonomous, does that mean the end of gas stations? Does that mean the end of certain cultures and micro-cultures that exist around motor? When you think about AI, is it obvious to you that it’s bigger or smaller than any of these trends right now? You presented it as there was cloud, there was mobile, there were all these sort of things. A lot of us would say this seems bigger in terms of potential broad societal impact. Are you on defense?
Benedict: This kind of comes back to the scaling question on one level. So, if these models get 10x and then a 100x and then a 1000x better, and let’s park for the moment, what would that actually mean? There isn’t a straightforward linear scale to say, “Oh, this is 4.5x better.” But presuming these models get massively better, that’s one outcome, and then, does that go all the way to AGI? Again, that becomes another conversation.
But there’s also a view that says we’re oing to flatten out, maybe not next year, maybe the year after, but this isn’t going to go all the way to something that’s basically indistinguishable from a person. I think the base question within tech is: How much better do these things go? And that’s very different to talking about databases or SQL or 5G or smartphones or something. You saw the original iPhone and obviously you knew it was going to get better, but you weren’t looking at this and thinking what happens if it can unroll and fill the whole wall, or what would happen if it could fly down the street after me. There was a relatively constrained set of possibilities for what an iPhone would be or what the web would be. We don’t have that same base constrained idea of what exactly these things are going to be.
The other way to answer this is how you predict something that’s fundamentally unknowable. It almost becomes a theology question, which is why lots of people started dredging up their half-forgotten undergraduate philosophy – Plato’s Cave and Pascal’s Wager, and have another puff of the joint and say, “Well, maybe people aren’t self-aware either, we just think we are.” Yes, very clever. There are questions that don’t have answers and can’t be analyzed, and so aren’t particularly useful to talk about. And then, there’s the tangible questions. What do these things do now, and what are they likely to be able to do in the next year or two?
Des: Right. That makes sense. There’s no clear endpoint here, so you can only really analyze from what we’re seeing today.
“That is right on the edge of engineering and science fiction right now”
Benedict: Yeah, but I’d go back to the Bill Gates quote. With GUI, suddenly anyone can do customer service. You don’t have to learn what to type into the command line. Anyone can see. You have this much broader envelope of things that could be turned into software, but someone still has to make the GUI, someone has to make the customer service thing, someone has to make the tax software, someone has to make the accounting software or the payroll or whatever it is that you’re automating.
In theory, you could go to ChatGPT-7 and say, “Hey, can you buy this house for me?” And it would go and work out escrow and that you need to hire a solicitor and hire them. And it would work out that you need to tell your bank to do this… That is right on the edge of engineering and science fiction right now. I don’t think we really know whether these things will be able to do that, but that’s what we mean when we say these things might really scale because then you would have way more stuff that could be done in software without people needing to write the individual apps. So, that would give you massive increases in how much stuff could be automated.
Different kinds of retail, different customer journeys
Des: We’ll go back to a question from David from Times. Once AI CS agents become the norm and the likes of Temu can give Zappos-level service – Zappos being an online brand who was famous for high quality service – how do you see that affecting the world of e-commerce or commerce in general?
Benedict: I always go back to my slide where I asked about imponderables. What is it that customer service is really trying to do? Is it that you want to get a return? Is it that you want to ask questions? Is it exception handling? Is it what is the problem that we are trying to solve with customer service? And is it a problem that requires you to talk to a human or is it the old line about Kaizen: Do you clean the factory or do you reduce the sources of dust? How much do these systems change the problem? What we get with Temu or Amazon is, well, what happens if, instead of having to try the thing on, I just order three sizes? You can change what those customer journeys look like in different ways from different directions.
Des: Zappos built a whole brand on “we treat our customers really well,” and they have the famous stories where somebody would be on call for two hours with a Zappos employee discussing their breakup or something like that. Ultimately, they managed to turn it into a brand. Are you saying basically it depends on how a businesses chooses to weaponize CS-
Benedict: Well, it depends on what it is. A practical personal example: I’m booking a flight for my family from New York to London. I’ve booked it with miles. The airline, British Airways, managed to lose the booking. So they’ve taken the miles, they’ve lost the booking, and I’ve spent 45 minutes on the phone with somebody this morning to get it sorted out. So, on one level, this is great customer service. There’s like a dozen people at BA trying to unfuck this. But wouldn’t better customer service be that the booking just hadn’t got lost?
Des: If it wasn’t fuckable in the first place. Yeah, that’s when you see customer service as a source of error or as a source of customer confusion. The challenge is that some brands live on the edge where actually talking to their customers is part of the thing, and it remains to be seen what AI will offer in that world.
“There are different kinds of retailing, and different kinds of retailing and merchandising models have different kinds of service”
Benedict: Another way to answer this is that one of the ways you could think about that e-commerce chart is that the last 25 years of e-commerce have been about, “What can you turn into Amazon? What can you turn into a purely commoditized, packetized system where the system does not know what the product is?” The whole point of Amazon is they do not know what any individual thing is, it’s just a SKU. Everything gets treated in exactly the same way, which makes it incredibly efficient. But it also means that if you try and buy children’s shoes on Amazon, you go one level in and then it breaks because now you can’t actually have this shoe but in a different color with a different size, that’s just a different SKU. So you’ve got to go back to the beginning and work out what to search for it.
And so, half the e-commerce journey has been about what can you turn into a SKU, and the other half has been about how to build new experiences that would let you buy that online, instead of going to a store, that would let you give experience, recommendation, discovery, service, but on a website rather than going to a physical store, which is what Zappos was trying to do. And it’s what I would’ve said what the online fashion industry, like you Yoox and Net-a-Porter have been trying to do, except they all appear to be a smoking hole in the ground. But that’s the other side of this. Nobody is going to buy a $5,000 handbag on Amazon, just as they don’t buy it in Walmart. Those are different customer journeys, which is kind of my point that this is retailing. This is not e-commerce, it’s retailing. There are different kinds of retailing, and different kinds of retailing and merchandising models have different kinds of service.
Des: Have you considered the other angle of e-commerce, which is generative products? Like the idea of asking the AI to make the T-shirt, make the suit. Really try to recreate a very dynamic in-store thing where the customers can invent their own SKUs, in a sense.
“We’re starting to get AI-generated genre fiction. ‘Make me 30 vampire, fanfic, cowboy erotica stories’”
Benedict: We can already see this in music. You go to Spotify and ask for relaxing, and you’ll be able to go and say, “I want a punk rock song about cowboys in ancient Rome,” and you’ll just get it. You can already do that now. And that’s just product building. That’s not even science or engineering. You can have that with children’s books. We’re starting to get that with genre fiction. “Make me 30 vampire, fanfic, cowboy erotica stories,” and I’m not making this up, I’ve seen this. There’s a huge kerfuffle in the world of genre writing about what this means. Is this theft? How do we understand what that is? So, there’ll be certain things where it’s very easy to generate product.
If you follow a lot of architecture accounts, do you care if the buildings are real buildings? And the answer is it depends. If you are doing a mood board for an apartment in a certain style and you give it 50 images and it makes 50 more that aren’t real apartments, well, that might be good. There are other scenarios where it might be bad. It depends on what you want it for. Product is kind of harder, but again, yes, I could go to Shein and type in, “I would like a T-shirt that looks like this,” and they would go, “Okay.”
Teaching the next generation
Des: Andrew from JourneyMapper has a question. How would you teach a five-year-old child differently here and now in the age of AI? What would you prioritize in their learning?
Benedict: It’s funny. I always see people in Silicon Valley talking about how their 9-year-old has made their first iPhone app and et cetera. And I’m like, “Yeah, my son wants to talk about football and girls.” There’s always the children’s books that your parents wanted to usually read and the children’s books that you wanted to read. They mainly involving farting, I think. And so, there’s always a challenge there.
“Just as you want your five-year-old to do theater and poetry and art and science and engineering, you should also want them to learn about computers”
I grew up in the eighties and we were all supposed to learn how to code. And I write code in Excel sometimes, but I don’t feel like everybody should learn how to code any more than anyone should learn how to be a car mechanic. But everyone should know what it is. A, you should know what it is so that you understand it as part of the world that you live in and potentially as something that you might be working with. And B, everyone should know what it is as something that they might choose to have as a career. And so, just as you want your five-year-old to do theater and poetry and art and science and engineering, you should also want them to learn about computers. They may end up going into work in theater, but they should still understand how it works, just as an actor should probably still understand how cars work.
Des: Do you think being able to spot AI or AI-generated stuff will be an important thing or do you think it won’t matter?
Benedict: If you think about what we’ve already seen with the war in Gaza, we have all these images where the image is real, but it’s not from Gaza, or the image is real and it is from Gaza, or it’s mislabeled and described as something that it isn’t. I don’t think there’s a problem of shortage of supply of images. There’s a problem of critical thinking and understanding. And that’s a problem of filter bubbles, it’s a problem of seeking out people that you agree with, a problem of skepticism, of something that fits what you want to believe. And I don’t think that you solve that. I don’t think that’s an AI problem. I think that, if it’s a problem, it’s a problem of the unbundling of traditional media, which is a much broader conversation. We no longer have five newspapers and three TV channels who decide what we see, therefore the world is different.
Picnics, websites, and the quest for self-service
Des: An online question from Milan, from velo.com, I hope I’m pronouncing that correctly. With AI models becoming increasingly capable of understanding and responding to customer queries, what do you think the future of self-service in customer support could look like?
Benedict: Again, I can go back to the slide, I showed different ways to answer this. One of them is you have a chatbot that can answer questions which might be hard to find on the website, and you can broaden that out to think about knowledge management in general. Another answer might be, and I am sure lots of people have looked at this, Walmart having a thing where you can search for what would be good to take on a picnic, which is not a good SQL query. And so, you have different kinds of questions or different ways that the user might be able to solve this.
I think the other side of this, though, is most companies, British Airways excepted, have spent the last 25 years trying to make their website better, thinking about the right flows, where people get stuck, how to make this not be a phone call. Make it one button that you can press or make it something that’s easy to solve or make it not break in the first place.
Just as a parallel for another industry, I used to work with a guy called Steven Sinovsky, who ran Microsoft Office and then Windows, and he pointed out one of these things that’s blindingly obvious when they tell you, which is that before the internet, no one who made software knew when the software crashed. And then, with the internet, Photoshop crashes, Office Excel crashes, AutoCAD crashes, AutoCAD tells Autodesk, “Hey, we crashed.” Excel tells Microsoft, “Hey, we crashed”. And so the reason why software basically doesn’t crash anymore, or a large part of it, is they know what went wrong. Having said that, I’m not a customer service expert, but I think half of this is like, “How do you replace a guy in the Philippines?” But the other half is, “How do you continue making people not have to call the guy in the Philippines?”
Just another API call
Des: Yeah. One from Cheryl, from Lloyd’s Banking Group. As a heavily regulated industry, banking, we aren’t allowed to only use one SaaS software as a service vendor. How can we integrate multiple AI providers in one ecosystem?
Benedict: So, maybe two answers to this. One of them is my slide about every question is either a platform-shift question or no-one-knows question. And this is kind of an enterprise software question. How do you integrate different kinds of enterprise software? When I worked at Andreessen Horowitz, we invested in a company called SnapLogic, which is a business systems integration system. So, you can take your SAP and your Oracle and your Concur and your Workday, and they let you build workflows that sit on top of that. And companies like that, probably dozen of them, it’s not my field, will now let you plug in and they will also let you plug in Intercom and Intercom’s competitors. And they will now let you plug in chatGPT and Gemini and Llama3. So, that will now be a component that you will be offered.
“Do we use the LLM sitting on top as the UX for the user to tell all the other systems what to do? Or does the LLM become just another API call?”
It’s almost my point that the incumbents may get a feature and startups try and unbundle it. Every big company will make this a feature, but every startup will also be saying, “Look, you can integrate us into your workflow. You can plug us into the flow a bit between SAP and Workday and we are your way to do this. And then, you will have the AI companies coming in from the top saying, “No, you should integrate us directly,” so you’ve got three or four or five different ways that people are trying to sell you this. I don’t think that will be different. It will be the same as how they would have sold you fraud detection. Do you use the big vendors’ small detection product, do you unbundle it and plug it in from outside or do you plug it in from the top or bottom? This is kind of the same thing.
It seems to me like one of the foundational questions after you talk about scaling is: Do we end up using the LLM to control all the other stuff? Do we use the LLM sitting on top as the UX for the user to tell all the other systems what to do? Or does the LLM become just another API call? It’s just another of all the different systems that you have, and you are using all of your existing orchestration layers to get all of them to talk to each other. And so, ChatGPT is just another of your SaaS vendors. Or is ChatGPT the thing that sits on top and every other SaaS is just an API call for ChatGPT? I think that’s sort of the question we’re groping towards. And my default is to say this will be just another API, but the scaling question is another way of saying that if it gets really, really, really good, it can just sit at the top of the stack and run everything else.
Des: For Cheryl, if the question is localized to customer service, a lot of people use Intercom, but they also might use a slightly more old-school thing for backoffice like heavy-ticketing features. And oftentimes, that architecture of if Intercom can handle that, let Intercom handle it. If you specifically want to inject multiple vendors because of some ruling internally, you can put Intercom in front of your customer support and still maintain your existing posture, and you’ll be able to claim you’re using two. Intercom will do 51% or thereabouts of your work, the 49% can still go into your existing system and you’ll work it out from there.
Last question is from Lee Burkhill from Money Supermarket. Do you think there will be a shift to consumers interfacing with AI assistance to buy products and services instead of interfacing with archaic things like websites? Will bots become the new websites?
Benedict: This is, again, a scaling question. In principle, if these things get a million times better, then, yes, I could just say to this thing, “Hey, can you optimize my bills for me?” And it will go and log into my bank account, look at all of the accounts, see the gas bill and the credit card bill and the phone bill and right, done. This is the dream of the omniscient servant who could do anything for you. What’s probably more likely is that I go to the price comparison engine and it says, “Hey, take a photo of all of your bills,” and those are then a ChatGPT API call. And so it sends each of the bills to ChatGPT, comes back and says, that’s a Vodafone bill, that’s a Barclaycard bill, that’s a Thameswater bill, that’s your credit, and this is how you can optimize each of them. It all becomes another API.
A friend of mine is building a company that wants to be a competitor to Money Supermarket, and the way that he thinks about it is you could have done that 10 years ago, 20 years ago, by getting people to type the bills in. You could do it 10 years ago by building a parser for every credit card company. And now it’s just an API call. Does this do the whole fucking thing, or is this just a new component that makes it radically easier to build certain kinds of product or automate certain functions or unbundle the incumbents with a better way of doing the same thing? But you’ve still got thousands of companies out there doing it.
Des: I would love to keep going, but I know we are out of time. I just want to say once again, Benedict Evans, thank you so much for joining us today.
Benedict: Thank you.