Social Strikes Back is a series exploring the next generation of social networks and how they’re shaping the future of consumer tech. See more at a16z.com/social-strikes-back.

Audio has become omnipresent: we command our lights and music to match our moods, we ask Google or Siri to settle our bets, we wake up with Spotify and fall asleep to podcasts. Everywhere around us, dumb products are getting an AI education—and with every “smart” (questionably useful) upgrade comes more speakers and microphones. That’s enabling new audio interactions that didn’t exist a decade ago, as well as deeper, more intimate, and more spontaneous social connections than ever before. 

But there’s a lot more to audio than podcasts and smart assistants. In fact, we anticipate that the audio innovation of the next decade will rival what we’ve seen in video apps over the past few years.

The draw of audio apps over other traditional formats is obvious to any podcast (and music) devotee: the ease. That lean-back, hands-free experience means that audio apps generally don’t compete with a vast competitive library of other startups. Instead, they compete with washing dishes, working out, driving. This dynamic is akin to the competitive landscape for mobile apps 10 years ago. Early entrants were competing with waiting in line, sitting in bed, and staring at the ceiling while riding a bus—and achieved hypergrowth, as a result. Easy competition! Today, traditional apps are just one notification or swipe away from losing users to Instagram, iMessage, or thousands of other engaging apps. In contrast, audio startups face a less crowded and less competitive landscape.

Unlike much of social media, which just shows the highlights—the amazing travel adventures, the huge mansions and cars, fitness influencers, or people with amazing dance skills—audio hits different. Listening to someone’s voice is personal, and hearing unedited audio is the opposite of seeing the highlights. It’s about ideas, not the visuals, so it emphasizes a different kind of content that can often feel deeper and more intellectually stimulating. When you listen to Elon Musk get interviewed by Joe Rogan for two hours, you may begin to develop a deeper understanding of how he thinks—beyond the headlines. When you listen to a comedian like Tina Fey read her autobiographical audiobook over multiple hours, you start to feel an emotional bond with the person. When you listen to a live conversation on Clubhouse and hear people talk over each other, all the “ums,” and sometimes awkward silences, it reminds you—in a shelter-in-place era—what a lively dinner conversation is supposed to feel like.

Beyond podcasts

The most obvious killer app for audio is podcasting. In recent years, listening to podcasts has become more mainstream; today, more than half the US population has listened to roughly a million shows. In parallel, we’ve seen the rise of more podcast creators, and much more volume, thanks to podcasting tools and hardware. With tools like Descript (which democratizes the editing of podcasts and videos), Anchor (which makes hosting and distribution easier), and others, you don’t even need a fancy mic or studio setup to participate in the experience. The content creators behind shows like Joe Rogan, Call Her Daddy, and others are already highly sought after, paid millions of dollars to reach millions of fans. But while podcasting is a massive, growing market—and we’re certainly excited to see further innovations there—it’s often solitary and one-sided. We believe there’s even more opportunity for audio-first products that go beyond passive listening. 

In particular, we’re excited to see the emergence of platforms that provide user-generated content, live conversations, and other social interactions; a counterpoint to the highly produced, one-way nature of audiobooks and podcasts. That emotion that audio can stoke—the swell of a concert, a baby’s laugh, the roar of a stadium crowd—is inherently social. Podcasting scratches the surface, in that it’s a network of user-generated creators and listeners, but there’s ample opportunity to go deeper.

The precedent for social+ audio

To glimpse where social+ audio might go, social+ video provides a good point of comparison. Though YouTube has dominated the video landscape in the 15 years since its founding, it has since been joined by a small army of powerful challengers and upstarts. Today, “video” spans a broad family of ideas and business models, including YouTube’s short video clips, Zoom’s video conferences, Snapchat’s stories, Twitch’s live streaming, TikTok’s dance videos, and dozens more variations. This encompasses not only stand-alone video products, but also embedded features within other products: collaboration tools, messaging apps, and much more. “Video” refers to far more than just the technical format. Who creates the video? How is it delivered? When is it seen? Why was it sent? And what actions are the viewers invited to take? These questions all matter—and they define products much more deeply than the blanket term of “video apps.” Remarkably, rather than a single product coming to dominate all forms of video, TikTok, Twitch, and YouTube have all come to inhabit different corners of the market, each independently worth many billions of dollars.

Looking at audio through the lens of the user experience, the interactions are similar to video in some ways and radically different in others. Like video, audio can provide a lean-back experience that users can enjoy for hours at a time. Just as it’s compelling to watch influencers and celebrities, it’s also enjoyable to listen to them—particularly in comedy, sports, news, politics, and other “talk radio” categories that have had massive adoption in the radio era. In addition, audio, like video, lends itself well to fiction, non-fiction, and many other categories. Video is easily created by everyone, thanks to the camera on our pocket-sized supercomputers; similarly, it’s familiar and easy to create audio content using our phones. However, audio also scales to professional settings, where podcasting has demonstrated the power of high-quality, edited audio content. 

In the same way, though audio apps may have begun with podcasting and audio books, I’m convinced we’re on the front-end of a decade of innovation in social audio experiences.

How social audio might innovate

To see how future innovation might progress, it’s helpful to survey the current social-audio product landscape in abstract and break down some of the defining attributes. There are many existing audio use cases that can be categorized in various different ways.

One way to describe podcasting, for example, is that it’s a user-generated network. It consists mostly of semi-professional content creators broadcasting to a wide, public audience on a horizontal set of products and protocols. Users often discover new podcasts by following their favorite creators. The interaction is generally one-way, and the business model is advertising.

Similarly, we can use this type of language to break down a second category of social audio, a group call between three friends. There are many apps that can facilitate this, including FaceTime, which creates small networks of ephemeral conversation. It’s socially motivated and it’s a lean-forward experience, since everyone is expected to talk. 

An app like Clubhouse provides yet another example. The experience is somewhere between a conference call, a podcast, and a live talk show. Although the content is ephemeral, like a phone call, it’s also a horizontal and public platform, which is more like live podcasting. 

Once we start to enumerate different social audio ideas, attributes, and use cases, an interesting set of patterns emerges. This list of product attributes is not meant to be exhaustive. Rather, it’s an initial framework designed to further our collective thinking of what might be possible in audio, particularly when many of these aspects are combined into a single product. If you were to mix and match each of the attributes above, there are many thousands of possible configurations that might lead to new, cohesive products. 

Of course, I don’t expect each of these attributes to attract equal attention from entrepreneurs. Innovation will likely focus on a few leading decisions, which then help drive related attributes. Given that groundwork, I believe three themes will emerge: innovation in the content format, the evolution of the business model, and the growing ubiquity of audio.

Three themes shaping the future of social audio

One key product decision is the form factor of the content. On one end of the spectrum, we have seen products succeed in facilitating very short-form, easy-to-create written and video content—think Twitter and Snapchat Stories. On the opposite end of the spectrum, long-form writing has seen success on blogging platforms like WordPress and new platforms like Substack; long-form videos live on YouTube, as well as professional platforms like Netflix and Hulu. In the same way, both long- and short-form content are likely to thrive on audio-first platforms. 

Real-time voice communication is easy to create—particularly one short reply at a time—but is messy and full of digressions. If the effect is too unpolished, it might be less interesting for the listener. This is the Twitter analog for social audio. Clubhouse is one implementation of this idea (among others), but there are likely to be other approaches: some might be asynchronous rather than live, or focused on a particular niche of creators (comedians? news analysts? sports announcers?). On the long-form end of the spectrum, I expect there to be a rapid evolution from the podcasts and audiobooks of today. A future product might be long-form, with a twist—maybe it focuses on high-quality educational content or short stories from prominent authors. Perhaps the innovation will happen at the technology level, where a new platform could integrate monetization and tooling. Or perhaps the product will be built with social features so that listeners can interact with the content. There are many promising combinations.

The business model for content creation online has been evolving in recent years and is likely to accelerate in a world of audio and social platforms. In the past few years, startups like Substack, Patreon, and Shopify have created alternative ways for creators to build businesses online through direct transactions, rather than advertising. This shift exposed a simple fact: in an ad-driven world, creators were generally under-monetizing their audiences. Fans of creators were willing to pay more—a lot more. Newsletter authors like Jonah Goldberg of The Dispatch, for instance, can generate millions of dollars per year directly from reader subscriptions. The same trend is likely to happen in audio, where the podcast advertising business model is small relative to its adoption—under $1 billion—and unlikely to keep pace when rigorous targeting, measurement, and ROI tracking is stymied by archaic underlying technology. Instead, it’s likely that creators will figure out how to directly charge their audiences. This might happen in the form of freemium—in which the basic service is free, but advanced features or product offerings are paid—perhaps through ticket sales to live events, or unlocking libraries of pre-recorded content. Combine these unique business models with a novel audio format or method of interaction and you might get something that really flies. After all, it wasn’t until a “native” monetization method for search—sponsored keywords—was combined with the core interaction of the product that Google became what it was. In the near future, we will likely transcend sponsored promos to truly native social audio monetization, unlocking the next generation of social companies.

The final theme is around ubiquity. In examining how messaging and chat has evolved, there has been a bifurcation between stand-alone apps and embedded features. Slack and WhatsApp are two examples of destinations for messaging: users build up distinct social networks of friends and colleagues and use these apps to communicate with them. But the other thing that happened to messaging is that it has gotten baked into many, many other apps—into Yelp’s UI to reach out to local businesses, for example, or into Uber’s ability to contact your driver. Snapchat’s Stories feature famously originated within the photo/video messaging app, but now exists as part of LinkedIn’s professional networking feature set. The same transition is likely to happen in audio. We are already seeing inklings of this: Discord’s voice features to talk to other gamers is just one of several communication options within the network; text-driven services like Twitter are already offering the option of “voice tweets.” As these features are integrated into more platforms and the volume of audio content grows, it will be increasingly likely to be incorporated as part of Alexa, smart appliances, in-car systems, and more. Such audio services may even fade into the background over time. Just as many apps today support messaging, over time they’re likely to support synchronous and asynchronous audio, as well.

The convergence of technology and human behavior

Behavior is shaped by the constraints of technology, and in turn, technology is pushed forward by the needs and demands of consumers. When the telegraph was popularized in the 1800s, its rapid rate of transmission made it ideal for urgent and important messages. The invention of the printing press made it much cheaper and faster to create books and manuscripts, spurring a new era of mass media and allowing millions of people to interact with the written word. These inventions created new product categories and consumer demands—to be faster, to allow for voice, to be portable—leading to the telephone, the steam-powered printing press, typewriters, in turn causing the next waves of innovation. In the modern era, history repeats itself.

Audio, now squarely at the intersection of consumer behavior and technological change, is at the precipice of a new wave of innovation. Fueled by RSS, followed by the adoption of AirPods, smart speakers, and more, the demand for audio in the form of podcasts and audiobooks is at an all time high. That, in turn, makes entrepreneurs more adventurous in advancing new social and user-generated methods of creating audio content. The next decade of innovation of audio will likely be as productive and valuable that of messaging, video, and other media to date. Audio will create the next generation of startups in social networking, social content platforms, and publishing, and will be embedded into a wide variety of products and services. Follow the innovation—listen closely.

Everything you thought you knew about social networks is getting reinvented.

Our new series, Social Strikes Back, explores the hyper-social future of consumer tech.

See more

Want more CFI Consumer?

Sign up to get insights and analysis on how marketplaces break out and scale.

Thanks for signing up for the CFI Consumer newsletter.

Check your inbox for a welcome note.

MANAGE MY SUBSCRIPTIONS By clicking the Subscribe button, you agree to the Privacy Policy.