Research in artificial intelligence is increasing at
an exponential rate. It’s difficult for AI experts to keep up with everything
new being published, and even harder for beginners to know where to
start.
So, in this post, we’re sharing a curated list of
resources we’ve relied on to get smarter about modern AI. We call it the “AI
Canon” because these papers, blog posts, courses, and guides have had an
outsized impact on the field over the past several years.
We start with a gentle introduction to
transformer and latent
diffusion models, which are
fueling the current AI wave. Next, we go deep on technical learning resources;
practical guides to building with large language models (LLMs); and analysis of
the AI market. Finally, we include a reference list of landmark research
results, starting with “Attention is All You Need”—the 2017 paper by Google that
introduced the world to transformer models and ushered in the age of generative
AI.
A gentle
introduction…
These articles require no specialized background and
can help you get up to speed quickly on the most important parts of the modern
AI wave.
- Software 2.0: Andrej Karpathy was one of the first to clearly
explain (in 2017!) why the new AI wave really matters. His argument is that
AI is a new and powerful way to program computers. As LLMs have improved
rapidly, this thesis has proven prescient, and it gives a good mental model
for how the AI market may progress.
- State of GPT:
Also from Karpathy, this is a very approachable explanation of how ChatGPT /
GPT models in general work, how to use them, and what directions R&D may
take.
- What
is ChatGPT doing … and why does it
work?: Computer scientist and entrepreneur Stephen
Wolfram gives a long but highly readable explanation, from first principles,
of how modern AI models work. He follows the timeline from early neural nets
to today’s LLMs and ChatGPT.
- Transformers, explained: This post by Dale Markowitz is a shorter, more
direct answer to the question “what is an LLM, and how does it work?” This
is a great way to ease into the topic and develop intuition for the
technology. It was written about GPT-3 but still applies to newer
models.
- How Stable Diffusion
works: This is
the computer vision analogue to the last post. Chris McCormick gives a
layperson’s explanation of how Stable Diffusion works and develops intuition
around text-to-image models generally. For an even gentler introduction, check out this comic from r/StableDiffusion.
Foundational
learning: neural networks, backpropagation, and embeddings
These resources provide a base understanding of
fundamental ideas in machine learning and AI, from the basics of deep learning
to university-level courses from AI experts.
Explainers
Courses
- Stanford
CS229:
Introduction to Machine Learning with Andrew Ng, covering the fundamentals
of machine learning.
- Stanford
CS224N: NLP with
Deep Learning with Chris Manning, covering NLP basics through the first
generation of LLMs.
Tech deep dive:
understanding transformers and large models
There are countless resources—some better than
others—attempting to explain how LLMs work. Here are some of our favorites,
targeting a wide range of readers/viewers.
Explainers
Courses
- Stanford
CS25:
Transformers United, an online seminar on Transformers.
- Stanford CS324: Large Language Models with Percy Liang, Tatsu
Hashimoto, and Chris Re, covering a wide range of technical and
non-technical aspects of LLMs.
Reference and commentary
- Predictive learning, NIPS
2016: In this
early talk, Yann LeCun makes a strong case for unsupervised learning as a
critical element of AI model architectures at scale. Skip to 19:20 for the famous cake analogy, which is still one
of the best mental models for modern AI.
- AI for full-self driving at
Tesla: Another classic Karpathy talk, this time
covering the Tesla data collection engine. Starting at 8:35 is one of the great all-time AI rants, explaining
why long-tailed problems (in this case stop sign detection) are so
hard.
- The scaling hypothesis: One of the most surprising aspects of LLMs is
that scaling—adding more data and compute—just keeps increasing accuracy.
GPT-3 was the first model to demonstrate this clearly, and Gwern’s post does
a great job explaining the intuition behind it.
- Chinchilla’s wild
implications: Nominally an explainer of the important
Chinchilla paper (see below), this post gets to the heart of the big
question in LLM scaling: are we running out of data? This builds on the post
above and gives a refreshed view on scaling laws.
- A survey of large language
models:
Comprehensive breakdown of current LLMs, including development timeline,
size, training strategies, training data, hardware, and more.
- Sparks
of artificial general intelligence: Early experiments with
GPT-4: Early analysis from Microsoft Research on the
capabilities of GPT-4, the current most advanced LLM, relative to human
intelligence.
- The AI revolution: How Auto-GPT
unleashes a new era of automation and
creativity: An introduction to Auto-GPT and AI agents in
general. This technology is very early but important to understand—it uses
internet access and self-generated sub-tasks in order to solve specific,
complex problems or goals.
- The Waluigi
Effect:
Nominally an explanation of the “Waluigi effect” (i.e., why “alter egos”
emerge in LLM behavior), but interesting mostly for its deep dive on the
theory of LLM prompting.
Practical guides to building with LLMs
A new application stack is emerging with LLMs at the
core. While there isn’t a lot of formal education available on this topic yet,
we pulled out some of the most useful resources we’ve found.
Reference
- Build a GitHub support bot with GPT3,
LangChain, and Python: One of the earliest public explanations of the
modern LLM app stack. Some of the advice in here is dated, but in many ways
it kicked off widespread adoption and experimentation of new AI apps.
- Building LLM applications for
production: Chip
Huyen discusses many of the key challenges in building LLM apps, how to
address them, and what types of use cases make the most sense.
- Prompt Engineering Guide: For anyone writing LLM prompts—including app
devs—this is the most comprehensive guide, with specific examples for a
handful of popular models. For a lighter, more conversational treatment, try
Brex’s prompt engineering
guide.
- Prompt injection: What’s the worst that
can happen? Prompt injection is a potentially serious
security vulnerability lurking for LLM apps, with no perfect solution yet.
Simon Willison gives the definitive description of the problem in this post.
Nearly everything Simon writes on AI is outstanding.
- OpenAI
cookbook: For developers, this is the definitive
collection of guides and code examples for working with the OpenAI API. It’s
updated continually with new code examples.
- Pinecone
learning center: Many LLM apps are based around a vector search
paradigm. Pinecone’s learning center—despite being branded vendor
content—offers some of the most useful instruction on how to build in this
pattern.
- LangChain
docs:
As the default orchestration layer for LLM apps, LangChain connects to just
about all other pieces of the stack. So their docs are a real reference for
the full stack and how the pieces fit together.
Courses
- LLM Bootcamp: A practical course for building LLM-based
applications with Charles Frye, Sergey Karayev, and Josh Tobin.
- Hugging Face
Transformers:
Guide to using open-source LLMs in the Hugging Face transformers
library.
LLM benchmarks
- Chatbot Arena: An Elo-style ranking system of popular LLMs, led
by a team at UC Berkeley. Users can also participate by comparing models
head to head.
- Open LLM
Leaderboard: A
ranking by Hugging Face, comparing open source LLMs across a collection of
standard benchmarks and tasks.
Market analysis
We’ve all marveled at what generative AI can produce,
but there are still a lot of questions about what it all means. Which products and companies will survive and
thrive? What happens to artists? How should companies use it? How will it affect
literally jobs and society at large? Here are some attempts at answering these
questions.
CFI thinking
- Who owns the generative AI
platform?: Our flagship assessment of where value is
accruing, and might accrue, at the infrastructure, model, and application
layers of generative AI.
- Navigating the high cost of AI
compute: A detailed breakdown of why generative AI models
require so many computing resources, and how to think about acquiring those
resources (i.e., the right GPUs in the right quantity, at the right cost) in
a high-demand market.
- Art isn’t dead, it’s just
machine-generated: A look at how AI models were able to reshape
creative fields—often assumed to be the last holdout against automation—much
faster than fields such as software development.
- The generative AI revolution in
games: An
in-depth analysis from our Games team at how the ability to easily create
highly detailed graphics will change how game designers, studios, and the
entire market function. This
follow-up piece from our
Games team looks specifically at the advent of AI-generated content vis à
vis user-generated content.
- For B2B generative AI apps, is less
more?: A prediction for how LLMs will evolve in the
world of B2B enterprise applications, centered around the idea that
summarizing information will ultimately be more valuable than producing
text.
- Financial services will embrace
generative AI faster than you think: An argument that the financial services industry
is poised to use generative AI for personalized consumer experiences,
cost-efficient operations, better compliance, improved risk management, and
dynamic forecasting and reporting.
- Generative AI: The next consumer
platform: A look at opportunities for generative AI to
impact the consumer market across a range of sectors from therapy to
ecommerce.
- To make a real difference in health
care, AI will need to learn like we
do:
AI is poised to irrevocably change how we look to prevent and treat illness.
However, to truly transform drug discovery to care delivery, we should
invest in creating an ecosystem of “specialist” AIs—that learn like our best
physicians and drug developers do today.
- The new industrial revolution: Bio x
AI:
The next industrial revolution in human history will be biology powered by
artificial intelligence.
Other perspectives
- On the opportunities and risks of foundation
models: Stanford
overview paper on Foundation Models. Long and opinionated, but this shaped
the term.
- State of
AI Report: An annual roundup of everything going on in AI,
including technology breakthroughs, industry development,
politics/regulation, economic implications, safety, and predictions for the
future.
- GPTs are
GPTs: An early look at the labor market impact potential of
large language models: This paper from researchers at OpenAI,
OpenResearch, and the University of of Pennsylvania predicts that “around
80% of the U.S. workforce could have at least 10% of their work tasks
affected by the introduction of LLMs, while approximately 19% of workers may
see at least 50% of their tasks impacted.”
- Deep medicine: How artificial
intelligence can make healthcare human
again: Dr. Eric Topol reveals how artificial
intelligence has the potential to free physicians from the time-consuming
tasks that interfere with human connection. The doctor-patient relationship
is restored. (CFI dcast)
Landmark
research results
Most of the amazing AI products we see today are the
result of no-less-amazing research, carried out by experts inside large
companies and leading universities. Lately, we’ve also seen impressive work from
individuals and the open source community taking popular projects into new
directions, for example by creating automated agents or porting models onto
smaller hardware footprints.
Here’s a collection of many of these papers and
projects, for folks who really want to dive deep into generative AI. (For
research papers and projects, we’ve also included links to the accompanying blog
posts or websites, where available, which tend to explain things at a higher
level. And we’ve included original publication years so you can track
foundational research over time.)
Large language models
New models
- Attention is all you need (2017): The original transformer work and
research paper from Google Brain that started it all. (blog
post)
- BERT: pre-training of deep bidirectional transformers
for language understanding (2018): One of the first publicly available LLMs,
with many variants still in use today. (blog
post)
- Improving language understanding by
generative pre-training (2018): The first paper from OpenAI covering the
GPT architecture, which has become the dominant development path in LLMs.
(blog
post)
- Language
models are few-shot learners (2020): The OpenAI paper that describes GPT-3 and
the decoder-only architecture of modern LLMs.
- Training language models to follow instructions with
human feedback
(2022): OpenAI’s paper explaining InstructGPT, which utilizes humans in the
loop to train models and, thus, better follow the instructions in prompts.
This was one of the key unlocks that made LLMs accessible to consumers
(e.g., via ChatGPT). (blog post)
- LaMDA:
language models for dialog
applications (2022): A model form Google specifically designed
for free-flowing dialog between a human and chatbot across a wide variety of
topics. (blog
post)
- PaLM: Scaling language modeling with
pathways (2022):
PaLM, from Google, utilized a new system for training LLMs across thousands
of chips and demonstrated larger-than-expected improvements for certain
tasks as model size scaled up. (blog
post). See also the PaLM-2 technical report.
- OPT: Open Pre-trained Transformer language
models (2022):
OPT is one of the top performing fully open source LLMs. The release for
this 175-billion-parameter model comes with code and was trained on publicly
available datasets. (blog
post)
- Training compute-optimal large language
models (2022):
The Chinchilla paper. It makes the case that most models are data limited,
not compute limited, and changed the consensus on LLM scaling. (blog
post)
- GPT-4
technical report (2023): The latest and greatest paper from
OpenAI, known mostly for how little it reveals! (blog post). The GPT-4 system
card sheds some light on how
OpenAI treats hallucinations, privacy, security, and other issues.
- LLaMA:
Open and efficient foundation language
models (2023): The model from Meta that (almost) started
an open-source LLM revolution. Competitive with many of the best
closed-source models but only opened up to researchers on a restricted
license. (blog
post)
- Alpaca: A strong, replicable
instruction-following model (2023): Out of Stanford, this model demonstrates
the power of instruction tuning, especially in smaller open-source models,
compared to pure scale.
Model improvements (e.g. fine-tuning, retrieval, attention)
Image generation models
Agents
Other data modalities
Code generation
Video generation
Human biology and medical data
Audio generation
Multi-dimensional image generation
Special thanks to Jack Soslow, Jay Rughani, Marco
Mascorro, Martin Casado, Rajko
Radovanovic, and
Vijay
Pande for their
contributions to this piece, and to the entire CFI team for an always
informative discussion about the latest in AI. And thanks to Sonal
Chokshi and the crypto
team for building a long series of canons at the firm.