AI Revolution

AI Food Fights in the Enterprise

Ali Ghodsi and Ben Horowitz

Back to AI Revolution

This conversation is part of our AI Revolution series, which features some of the most impactful builders in the field of AI discussing and debating where we are, where we’re going, and the big open questions in AI. Find more content from our AI Revolution series on www.a16z.com/AIRevolution.

Ali Ghodsi, CEO and cofounder of Databricks, and Ben Horowitz, cofounder of CFI, explain the data wars happening inside and outside enterprises and how they could impact the evolution of LLMs.

  • [00:38] Why is it so hard for enterprise to adopt AI?
  • [03:08] Data wars
  • [04:28] Big vs. small LLMs
  • [08:13] Finetuning
  • [13:52] Open source AI
  • [17:51] Benchmarks are bullshit
  • [19:30] Why Ali isn’t afraid of today’s AI

Why is it so hard for enterprise to adopt AI?

Ben: Going to generative AI, 1 of the things that’s been interesting for us as a VC, is we see all kinds of companies. Some have amazing traction, but every company that has traction is in a category like selling to developers or consumers, or maybe selling to small law firms. But we haven’t seen anybody with any traction in enterprise. As the AI infrastructure for enterprise, why is it so hard for enterprises to adopt generative AI?

Ali: Enterprises move slow. The beauty of it is if you crack the code and you get in, it’s harder for them to throw you out. So that’s 1, they just move slower. Second, they’re super freaked out about the privacy and security of their data. Everybody’s been talking about data for 10, 15, 20 years. I just realized how valuable my data actually is. I’m sitting on a treasure trove and I’m going be super successful. But now that I finally realize how valuable this dataset I have is, I definitely don’t want to give it to you, or you, or you. I should be careful about this. Then there’s reports about data leakage. Suddenly, the LLM is spitting out your code or your source. They’re freaked out about that as well. All of these things are slowing it down and they’re thinking through it.

The second challenge enterprises have is that, for a lot of the use cases, we need the data to be accurate, we need it to be exact.

Ben: Are they right about that? Do they really need it to be accurate?

Ali: I think it depends on the use case. They’re just being cautious and they’re being slow, as they are in the big enterprise. There’s the last aspect, which people don’t talk about, which is there’s a food fight internally at the large enterprise.

Ben: Who’s fighting?

Ali: I own generative AI, not Ben. Then you go around and say, “Hey, I own generative AI.” There’s this food fight internally of who owns it, and they slow each other down. Then they say, “Hey, don’t trust Ben because he’s not handling data the right way.” But I’m building my GenAI and it’s unclear who owns GenAI. Is it IT? Is it the product line? Is it the business line?

There is huge politics going on inside the large enterprise. They want to do it, but there are all these hurdles in the way and the price is huge. Whoever can crack the code on that is going to create an amazing company.

Enterprise data wars

Ben: Are the enterprises right about not wanting to give their data to OpenAI, or Anthropic, or Bard, or whoever? Is that a correct fear or are they being silly and they could get so much value by putting their data in a big model?

Ali: They can. I get to talk to the CEOs of these big companies who previously were not interested in what I’m doing. I would be talking to the CIO, but now suddenly, they want to talk. They’re like, “I want this generative AI, I want to talk about strategy at my company. We have this dataset, it’s super valuable, and we have to do something with it. This generative AI seems interesting.”

One of the things that’s really interesting that’s happened in the brains of the CEOs and the boards is that they realize, “Maybe I can beat my competition. Maybe this is the kryptonite that will help me kill my enemy. I have the data with generative AI, I can actually go ahead and do that.”

Then they’re thinking, “But then I have to build it myself, I have to own that. I have to own the IP. I can’t just give away that IP to Anthropic or OpenAI. It has to be completely propriety. We have a whole bunch of people who are lined up outside of my office from different departments who are saying they actually will do it—and they can do it. We’re trying to figure out which of them I should give it to.” This is what’s happening internally right now.

Big vs. small LLMs

Ben: Interesting. From a strategy standpoint, let’s say you had a big dataset like a healthcare dataset, some kind of security dataset, or Nielsen’s dataset. Can they build a better model themselves for that with their data? Or if they took their data and put it in 1 of the large models, would that always beat what they’re doing?

Ali: This is why we did the acquisition of Mosaic. You can. It’s hard and requires a lot of GPUs. The Mosaic guys just figured out how to do that at scale for others. “You want to build your own LLM from scratch? Come to me, I know all the landmines. It will just work, trust me.”

They can do it; they’ve done it for large customers. It’s not for the faint of heart. It requires a lot of GPUs, costs a lot of money, and depends on your datasets and your use cases. They’re having a lot of success doing it for really large enterprises. They’ll train it from scratch for them and it just works.

Ben: The good news is it’s all my data and nobody can touch it. But is the bigger model such a bigger brain that I could get a better answer if I put that same data in the big model? Or is the Mosaic-tuned, enterprise specific, dataset specific model going to perform better? How do you think about that?

Ali: For specific use cases, you don’t need the big 1. First of all, you can build the big 1 with Mosaic and Databricks. Just how much money do you have? We’re happy to train a 100 Boolean parameter model if you want. Even if you have all the money to train it, it’ll cost you a lot to use it. When you’re using it and you’re doing inference, as it’s called, is going to cost you more.

Ben: How do you think about the diminishing returns on a dataset against how many parameters versus how much data do you have? Does a bigger model just start to be diminishing returns, both in terms of latency expense, everything?

Ali: There’s a scaling law. If you’re scaling the parameters up, you have to scale the data with it. If you don’t have that, you’re not going to get the bang for the buck from scaling. You still get improvement if you increase the parameters or if you increase just the data in any 1 of these dimensions. But…

Ben: But you’re going to pay.

Ali: You’re going to pay, it becomes inefficient. It’s no longer Pareto optimal. For enterprises that have specific use cases, which they all have, when they come to us, they don’t say, “Hey, I would love to have an LLM that could answer anything under the sun.” They’re saying, “This is what I want to do. I want to classify this particular defect in the manufacturing process from these pictures.” There the accuracy matters. There you’re better off if you have a good dataset to train. You can train a smaller model—the latency will be faster and it’s cheaper to use later. You absolutely can have accuracy that beats the really large model, but the model you built can’t also entertain you on the weekend, answer physics questions, and help your kids do their homework.

Ben: Why do you think it’s important for you, Databricks, to build a very large model?

Ali: The bigger models, if you follow the scaling laws, are more intelligent if you’re okay with paying the price, you have the GPUs, and if you can crack the code on how to fine-tune the bigger model. This is the holy grail right now that everybody’s looking at in the research community, in the field, and in companies.

Finetuning

Ben: When you say finetune, can you get more specific?

Ali: Take an existing really awesome model, the foundation model that exists, and just modify it a little bit to be able to become really good at some other tasks. There are many different techniques to use to do that, but right now nobody has really cracked the code on how you can do that without modifying the whole model itself. This is pretty costly, especially when you want to use it later.

Ben: Because you have to go through all the nodes.

Ali: Yes. If you made a thousand versions that are good at a thousand different things and you load each of those into the GPUs, serving them becomes very expensive. The holy grail that everybody’s looking for are there techniques where you can just do small modifications where you can get really good results.

Ben: Just that part of the brain.

Ali: Exactly. Just add this thing. There are lots of techniques, prefix tuning, LoRA, CUBE LoRA, and so on. Jury’s out and none of them really are slam dunk—but someone will.

Once you have that, in a few years, the ideal would be a really big foundation model that’s pretty smart. Then you can stack on these additional tuned sorts of brains that are really good at this specific classification task for manufacturing errors or translation paths.

Ben: They will be compute efficient, energy efficient for just dealing with that task at that point.

Ali: Exactly. Then you could also load up your GPUs with that 1 intelligent brain, that 1 giant model, and then specialize it. But to be clear, no one’s really done this yet.

Ben: That’s to do.

Ali: That’s what I think a lot of people are hoping to do, and it might not be easy. Meanwhile, we’re having lots of customers who want to have specialized models that are cheaper, smaller, and that have really high accuracy and performance on that task.

At Databricks, we bought Mosaic. I did not unleash our sales force and go to a market of 3000 people to sell the thing we bought because we just can’t satisfy the demand. There are not enough GPUs.

Ben: So you won’t even let all your guys sell it?

Ali: No, I’m not even letting all the customers buy this thing because we don’t have the GPUs. Every company wants to do this. “I have a thousand things I want to build, can you help me do that?”

Ben: In this context, how much do you think these use cases will fragment in that? You talked about how I want it to be good at doing my kids’ homework, I want it to be my girlfriend. How much do you think the very specific use cases will fragment? Within that, 1 of the things that we’re finding is getting the model to do what you want is where the data advantage is from the users. If I want it to draw me a certain kind of picture, there are a lot of conversations to do that. Whoever is drawing those kinds of pictures will be good at that. But then there may be another model that wants to draw memes, but the thing that’s drawing the pretty pictures can’t draw the memes because that involves words and other stuff that it just hasn’t learned to get out of the humans and map it into its model.

How much do you think we’re going to get tons of specialization versus no, once the brain gets big enough and we do these fine tunings, that’s going to be it. It will be like AWS, GCP, and Azure.

Ali: I think the answer is closer to the latter. It’s going to have lots of specialization. Having said that, it’s not a dichotomy in the sense that maybe they’re all using some base models that are underneath common to many of them. You’re not starting from scratch every time as you develop…

Ben: But you’re tuning it up a certain way.

Ali: Yes. I think in some sense, the industry and people are looking at the wrong thing. Right now, it’s a little bit like 2000 and the internet is about to take over everything and everybody’s super excited. There is 1 company called Cisco and they build these routers. Whoever can build the best routers is going to dominate all of the internet forever and determine the future of mankind. Cisco is the best 1, by far. Cisco in 2000, I think was worth half a trillion dollars at its peak. People were talking about how it’s going to be a trillion dollar company. It was worth more than Microsoft.

Right now I think it’s a little bit like that. Who has the largest LLM? Obviously whoever can build the largest 1 obviously will own all of AI and all the future of humanity. But just like the internet, someone will show up later and think about Uber rides and cab drivers. There ended up being these applications, many of which are obvious. Mark talked about it in Why AI will save the world. It’s not just going to be 1 model that OpenAI or Databricks or Anthropic builds, and that model will dominate all of these use cases. No, a lot of things will need to go into building the doctor that you trust to cure you.

Those are the companies that we’ll build in the future, and I think there’s going to be a lot of value in those. There’s a place for the Cisco routers and for LLM and so on. Cisco still is a pretty valuable company, but I think that’s there’s over-focus right now.

Open source AI

Ben: Interesting. How do you think about open source because a lot of the large model providers are literally going in and saying, “Stop open source now you’ve got to outlaw it.” Why are they saying that? Do they have a legitimate gripe?

Coming from Databricks’ perspective, how are you all thinking about open source with respect to Mosaic and then with the other things, like LLaMA?

Ali: If the original LLaMA was never released, what would the state of the world and our view of AI be right now? We would be way further behind. It was a big model that existed in open source. And it was open-sourced. Both of those things completely changed everything that’s happening in AI right now. So size kind of mattered, and the fact that it was open source also kind of mattered. It doesn’t stop there, it’s going to continue. It’s also really hard to block any of this because if you check out the source code for LLaMA, it’s like a couple pages.

Ben: Yeah, but you have to have the weights, too.

Ali: Yeah, but the weights leaked. People will leak the weights and they will get out and people will keep tuning them. There are also distillation techniques where you can take the weights from a continuous output of a model and train smaller ones and train other ones, and so on.

I think open source will continue to do better and better and better and I think more and more techniques. Because there’s scarcity—they don’t have GPUs—they’ll come up with techniques in which they can do things more efficiently, like the fast transformer.

At the same time, I also think that anyone that trains a really gigantic model that’s really, really good, typically will not have the incentive to release it.

The usual thing we see is that open source lags the proprietary ones. The proprietary thing is way ahead and it’s way better. In some rare cases like Linux, it bypasses. In that case it would be game-changing.

Ben: Will that happen?

Ali: It’s hard to predict. Right now it just seems that you need a lot of GPUs to do this.

Ben: How about when GPUs become abundant? That’s going to almost certainly happen.

Ali: GPUs become abundant, or certain tweaks to the transformer that lets you train at a higher learning rate, and you have less issues with it…

Ben: Right, because they’re super inefficient now. They couldn’t be more inefficient.

Ali: Yes. Then they will be released. The universities are just chomping at the bit because what has happened right now is that the universities feel a little bit…

Ben: They’re aced out, they’re not really even in the game anymore because they need to make changes.

Ali: This was my game. I was playing it. I was inventing. Now you threw me out and I can’t even participate because I don’t have GPUs and I don’t have the funding. Universities are having a huge crisis internally with the researchers.

Ben: I see you hired all my guys.

Ali: Their people are leaving because they want to work close to where they can train the models and do this kind of stuff where the data is. At the universities, there’s none of this. So then what are universities doing? They’re looking at how we could crack the code on this. How could we make it much easier, cheaper, and how can we release it? So there’s going to be innovation there.

I think this sort of race will continue between open source and proprietary. Eventually, open source catches up so there’s going to be diminishing returns. I think we’re going to hit walls with scaling laws. You go to the right on the x-axis and move the curve to the right and eventually you get AGI. It’s happening, it’s guaranteed. I think we’re going to hit diminishing returns on walls.

Benchmarks are bullshit

Ben: You think we’ll get stuck before we get to AGI and we’ll need an actual breakthrough as opposed to just more size.

Ali: That and I also think that in almost all the use cases where you seriously try to use this for medicine or lawyers, and so on, it quickly becomes clear that you need to augment it with a human in the loop. There’s no way you can just let this thing loose right now. It’s stupid, it makes mistakes. Maybe that can get better.

Ben: But it does better on the medical exams than doctors do.

Ali: This is a funny thing. I kinda think all the benchmarks are bullshit. All the LLM benchmarks, here’s how it works: Imagine in all our universities we said, “We’re going to give you the exam the night before and you can look at the answers. Then the next day we’re going to bring you in and you answer them. Then we’ll score how you did.” Suddenly, everybody would be acing their exams.

For instance, MMLU is what a lot of people benchmark these models on. MMLU is just a multi-choice question that’s on the web. Ask a question, “Is the answer ABCDE?” Then it says what the right answer is. It’s on the web, you can deliberately train on it and create an LLM that crushes it on that. Or you can inadvertently, by mistake in the pile or whatever you use to train your model, happen to see some of those questions that happen to be elsewhere, So the benchmarks are a little bit BS.

Ben: Well, they’re benchmarks for taking the test, but presumably, the test correlates with being able to make a medical diagnosis decision.

Ali: Yes. But they memorized all these.
Ben: But there’s no transfer learning from memorizing the exam to actually diagnosing…

Ali: No one really knows the answer to this. Everybody’s playing the benchmarking game this way right now. I would love it if a whole bunch of researchers that do…

Ben: It’s like the old fake database benchmarks when it’s like “look how fast our database is.” But it’s only good at the actual benchmark.

Ali: I would love it if there were a bunch of doctors that get together and come up with a benchmark that’s super secretive and they don’t show it to you. You give your model to them and they’ll run their questions on that. Then they’ll come back and tell you how you scored. But that’s not how it works right now.

So my point just is that I think right now you need a human in the loop. And I think we’ll need a phase shift in improvement to completely be able to eliminate human in the loop for most of these important tasks.

Why Ali isn’t afraid of today’s AI

Ben: Let me go to the question that you dodged, which is what are the ethics of the large models versus open source? Or, in general, what is the responsibility? How big is the threat? Is open source an ethical threat?

Ali: I don’t have all the answers. There are different categories. There’s the “jobs are going to go away” category. We’ve been doing that for 300 years, and the nations that are doing the highest GDP are the ones that automated the most.

Ben: And they have the most jobs and the highest employment.

Ali: There are ways to deal with that problem, and the way to deal with it is not to just stop all progress. That’s stupid. The nations that win are the ones that are doing well on automation, not just AI in general, efficiency improvements. Economics is about efficiency.

Then there are bad things that humans can do deliberately because they’re malicious, which is the 1 I think Marc was the most worried about. But ever since the invention of the hammer we started misusing technology in a bad way.

Ben: When you have a hammer, your head looks like a nail.

Ali: That’s happening all the time with every technological improvement, especially the internet. The really big question that I think Marc dodged a bit in his essay is: are we going to get this super AGI that decides to destroy us? And I don’t…

Ben: The “decide” part is the part where I get a little lost. Free will is not something we’re on the path for for machines. Machines are doing many, many, many computations. We’ve never had machines do this many computations in the history of humanity. That is amazing, but it’s very different than…no LLM has ever decided to do anything. That’s not what they do. So it does seem like now that they’ve got free will, what do we do?

Ali: I do think those hypotheticals…if you have a thing that has that level of intelligence and can control things and so on, then I do think that’s a big risk. I just don’t think that’s going to happen very soon. Here’s why: there’s several things that people are kind of not looking at. So I don’t agree with Marc when he says, “Oh, it’s just like a toaster. Your toaster will not decide to kill you.” I don’t believe that, that’s not true. This thing is pretty smart, it has reasoning capability and if you connect it to robots it can start doing things.

Ben: And let it run free with no safety.

Ali: Run free and say go do it, then it can do a lot of damage. The reason I’m not too worried about the scenario is the following: it’s very costly and hard to get your hands on GPUs and have the money to train a new model.

If that comes down and it takes 10 minutes to train a new model that’s as good as the largest, best models that we have, then we’re kind of fucked. Because then some asshole will say auto GPT, connect it, write a bunch of versions of yourself. Try it out in parallel, do a million of these in parallel, and then figure out if you’re getting smarter and smarter and smarter.

Then before you know it, after maybe 12 months, we find a slightly better version of the transformer that is a little bit more efficient. Now that 10 minutes goes to 2 minutes and then you’re on this race where eventually you will get into this loop where it can create itself.

Right now it’s extremely expensive and really hard to train a new large giant model—much harder than just asking questions from it, unlike the human brain where I can memorize new things and update my brain quickly. I can also read things from my memory and tell you things. Right now it’s a huge asymmetry.

Secondly, we really haven’t cracked the code on machines reproducing themselves biologically, like humans do. Once you have reproductions and the building of new ones automatically… once you crack the code on that loop, yes, then I think we’re fucked. But we’re very far away from that. Nobody’s really doing that. Just moving the scaling laws and getting these things to be better and better at reasoning doesn’t solve the problems that I mentioned. I think that’s what is saving us right now.

Ben: Well, on that happy note, we’ll conclude. I’d like to thank Ali for joining us today. And thank you all. Thank you.