Social Strikes Back is a series exploring the next generation of social networks and how they’re shaping the future of consumer tech. See more at a16z.com/social-strikes-back.
Audio has become omnipresent: we command our lights and music to match our moods, we ask Google or Siri to settle our bets, we wake up with Spotify and fall asleep to podcasts. Everywhere around us, dumb products are getting an AI education—and with every “smart” (questionably useful) upgrade comes more speakers and microphones. That’s enabling new audio interactions that didn’t exist a decade ago, as well as deeper, more intimate, and more spontaneous social connections than ever before.
But there’s a lot more to audio than podcasts and smart assistants. In fact, we anticipate that the audio innovation of the next decade will rival what we’ve seen in video apps over the past few years.
The draw of audio apps over other traditional formats
is obvious to any podcast (and music) devotee: the ease. That lean-back,
hands-free experience means that audio apps generally don’t compete with a vast
competitive library of other startups. Instead, they compete with washing
dishes, working out, driving. This dynamic is akin to the competitive landscape
for mobile apps 10 years ago. Early entrants were competing with waiting in
line, sitting in bed, and staring at the ceiling while riding a bus—and achieved
hypergrowth, as a result. Easy competition! Today, traditional apps are
just one notification or swipe away from losing users to Instagram, iMessage, or
thousands of other engaging apps. In contrast, audio startups face a less
crowded and less competitive landscape.
Unlike much of social media, which just shows the
highlights—the amazing travel adventures, the huge mansions and cars, fitness
influencers, or people with amazing dance skills—audio hits different. Listening
to someone’s voice is personal, and hearing unedited audio is the opposite of
seeing the highlights. It’s about ideas, not the visuals, so it emphasizes a
different kind of content that can often feel deeper and more intellectually
stimulating. When you listen to Elon Musk get interviewed by Joe Rogan for two
hours, you may begin to develop a deeper understanding of how he thinks—beyond
the headlines. When you listen to a comedian like Tina Fey read her
autobiographical audiobook over multiple hours, you start to feel an emotional
bond with the person. When you listen to a live conversation on Clubhouse and
hear people talk over each other, all the “ums,” and sometimes awkward silences,
it reminds you—in a shelter-in-place era—what a lively dinner conversation is
supposed to feel like.
The most obvious killer app for audio is podcasting. In recent years, listening to podcasts has become more mainstream; today, more than half the US population has listened to roughly a million shows. In parallel, we’ve seen the rise of more podcast creators, and much more volume, thanks to podcasting tools and hardware. With tools like Descript (which democratizes the editing of podcasts and videos), Anchor (which makes hosting and distribution easier), and others, you don’t even need a fancy mic or studio setup to participate in the experience. The content creators behind shows like Joe Rogan, Call Her Daddy, and others are already highly sought after, paid millions of dollars to reach millions of fans. But while podcasting is a massive, growing market—and we’re certainly excited to see further innovations there—it’s often solitary and one-sided. We believe there’s even more opportunity for audio-first products that go beyond passive listening.
In particular, we’re excited to see the emergence of
platforms that provide user-generated content, live conversations, and other
social interactions; a counterpoint to the highly produced, one-way nature of
audiobooks and podcasts. That emotion that audio can stoke—the swell of a
concert, a baby’s laugh, the roar of a stadium crowd—is inherently social.
Podcasting scratches the surface, in that it’s a network of user-generated
creators and listeners, but there’s ample opportunity to go
deeper.
To glimpse where social+ audio might go, social+
video provides a good point of comparison. Though YouTube has dominated the video landscape in the 15 years since its founding, it has since been
joined by a small army of powerful challengers and upstarts. Today, “video” spans a broad family of
ideas and business models,
including YouTube’s short video clips, Zoom’s video conferences, Snapchat’s
stories, Twitch’s live streaming, TikTok’s dance videos, and dozens more
variations. This encompasses not only stand-alone video products, but also
embedded features within other products: collaboration tools, messaging apps,
and much more. “Video” refers to far more than just the technical format. Who
creates the video? How is it delivered? When is it seen? Why was it sent? And
what actions are the viewers invited to take? These questions all matter—and
they define products much more deeply than the blanket term of “video apps.”
Remarkably, rather than a single product coming to dominate all forms of video,
TikTok, Twitch, and YouTube have all come to inhabit different corners of the
market, each independently worth many billions of dollars.
Looking at audio through the lens of the user
experience, the interactions are similar to video in some ways and radically
different in others. Like video, audio can provide a lean-back experience that
users can enjoy for hours at a time. Just as it’s compelling to watch
influencers and celebrities, it’s also enjoyable to listen to them—particularly
in comedy, sports, news, politics, and other “talk radio” categories that have
had massive adoption in the radio era. In addition, audio, like video, lends
itself well to fiction, non-fiction, and many other categories. Video is easily
created by everyone, thanks to the camera on our pocket-sized supercomputers;
similarly, it’s familiar and easy to create audio content using our phones.
However, audio also scales to professional settings, where podcasting has
demonstrated the power of high-quality, edited audio content.
In the same way, though audio apps may have begun with
podcasting and audio books, I’m convinced we’re on the front-end of a decade of
innovation in social audio experiences.
To see how future innovation might progress, it’s
helpful to survey the current social-audio product
landscape in abstract and break
down some of the defining attributes. There are many existing audio use cases
that can be categorized in various different ways.
One way to describe podcasting, for example, is that
it’s a user-generated network. It consists mostly of semi-professional content
creators broadcasting to a wide, public audience on a horizontal set of products
and protocols. Users often discover new podcasts by following their favorite
creators. The interaction is generally one-way, and the business model is
advertising.
Similarly, we can use this type of language to break down a second category of social audio, a group call between three friends. There are many apps that can facilitate this, including FaceTime, which creates small networks of ephemeral conversation. It’s socially motivated and it’s a lean-forward experience, since everyone is expected to talk.
An app like Clubhouse provides yet another example.
The experience is somewhere between a conference call, a podcast, and a live
talk show. Although the content is ephemeral, like a phone call, it’s also a
horizontal and public platform, which is more like live podcasting.
Once we start to enumerate different social audio ideas, attributes, and use cases, an interesting set of patterns emerges. This list of product attributes is not meant to be exhaustive. Rather, it’s an initial framework designed to further our collective thinking of what might be possible in audio, particularly when many of these aspects are combined into a single product. If you were to mix and match each of the attributes above, there are many thousands of possible configurations that might lead to new, cohesive products.
Of course, I don’t expect each of these attributes to attract equal attention from entrepreneurs. Innovation will likely focus on a few leading decisions, which then help drive related attributes. Given that groundwork, I believe three themes will emerge: innovation in the content format, the evolution of the business model, and the growing ubiquity of audio.
One key product decision is the form factor of the content. On one end of the spectrum, we have seen products succeed in facilitating very short-form, easy-to-create written and video content—think Twitter and Snapchat Stories. On the opposite end of the spectrum, long-form writing has seen success on blogging platforms like WordPress and new platforms like Substack; long-form videos live on YouTube, as well as professional platforms like Netflix and Hulu. In the same way, both long- and short-form content are likely to thrive on audio-first platforms.
Real-time voice communication is easy to create—particularly one short reply at a time—but is messy and full of digressions. If the effect is too unpolished, it might be less interesting for the listener. This is the Twitter analog for social audio. Clubhouse is one implementation of this idea (among others), but there are likely to be other approaches: some might be asynchronous rather than live, or focused on a particular niche of creators (comedians? news analysts? sports announcers?). On the long-form end of the spectrum, I expect there to be a rapid evolution from the podcasts and audiobooks of today. A future product might be long-form, with a twist—maybe it focuses on high-quality educational content or short stories from prominent authors. Perhaps the innovation will happen at the technology level, where a new platform could integrate monetization and tooling. Or perhaps the product will be built with social features so that listeners can interact with the content. There are many promising combinations.
The business model for content creation online has been evolving in recent years and is likely to accelerate in a world of audio and social platforms. In the past few years, startups like Substack, Patreon, and Shopify have created alternative ways for creators to build businesses online through direct transactions, rather than advertising. This shift exposed a simple fact: in an ad-driven world, creators were generally under-monetizing their audiences. Fans of creators were willing to pay more—a lot more. Newsletter authors like Jonah Goldberg of The Dispatch, for instance, can generate millions of dollars per year directly from reader subscriptions. The same trend is likely to happen in audio, where the podcast advertising business model is small relative to its adoption—under $1 billion—and unlikely to keep pace when rigorous targeting, measurement, and ROI tracking is stymied by archaic underlying technology. Instead, it’s likely that creators will figure out how to directly charge their audiences. This might happen in the form of freemium—in which the basic service is free, but advanced features or product offerings are paid—perhaps through ticket sales to live events, or unlocking libraries of pre-recorded content. Combine these unique business models with a novel audio format or method of interaction and you might get something that really flies. After all, it wasn’t until a “native” monetization method for search—sponsored keywords—was combined with the core interaction of the product that Google became what it was. In the near future, we will likely transcend sponsored promos to truly native social audio monetization, unlocking the next generation of social companies.
The final theme is around ubiquity. In examining how messaging and chat has evolved, there has been a bifurcation between stand-alone apps and embedded features. Slack and WhatsApp are two examples of destinations for messaging: users build up distinct social networks of friends and colleagues and use these apps to communicate with them. But the other thing that happened to messaging is that it has gotten baked into many, many other apps—into Yelp’s UI to reach out to local businesses, for example, or into Uber’s ability to contact your driver. Snapchat’s Stories feature famously originated within the photo/video messaging app, but now exists as part of LinkedIn’s professional networking feature set. The same transition is likely to happen in audio. We are already seeing inklings of this: Discord’s voice features to talk to other gamers is just one of several communication options within the network; text-driven services like Twitter are already offering the option of “voice tweets.” As these features are integrated into more platforms and the volume of audio content grows, it will be increasingly likely to be incorporated as part of Alexa, smart appliances, in-car systems, and more. Such audio services may even fade into the background over time. Just as many apps today support messaging, over time they’re likely to support synchronous and asynchronous audio, as well.
Behavior is shaped by the constraints of technology, and in turn, technology is pushed forward by the needs and demands of consumers. When the telegraph was popularized in the 1800s, its rapid rate of transmission made it ideal for urgent and important messages. The invention of the printing press made it much cheaper and faster to create books and manuscripts, spurring a new era of mass media and allowing millions of people to interact with the written word. These inventions created new product categories and consumer demands—to be faster, to allow for voice, to be portable—leading to the telephone, the steam-powered printing press, typewriters, in turn causing the next waves of innovation. In the modern era, history repeats itself.
Audio, now squarely at the intersection of consumer behavior and technological change, is at the precipice of a new wave of innovation. Fueled by RSS, followed by the adoption of AirPods, smart speakers, and more, the demand for audio in the form of podcasts and audiobooks is at an all time high. That, in turn, makes entrepreneurs more adventurous in advancing new social and user-generated methods of creating audio content. The next decade of innovation of audio will likely be as productive and valuable that of messaging, video, and other media to date. Audio will create the next generation of startups in social networking, social content platforms, and publishing, and will be embedded into a wide variety of products and services. Follow the innovation—listen closely.
Our new series, Social Strikes Back, explores the hyper-social future of consumer tech.
See more