General Partner Anjney Midha explores the cutting-edge world of text-to-video AI with AI researchers Andreas Blattman and Robin Rombach.
Released in November, Stable Video Diffusion is their latest open-source generative video model, overcoming challenges in size and dynamic representation.
In this episode, Robin and Andreas share why translating text to video is complex, the key role of datasets, current applications, and the future of video editing.
00:00 – Text to Video: The Next Leap in AI Generation
02:41 – The Stable Diffusion backstory
04:25 – Diffusion vs autoregressive models
06:09 – The benefits of single step sampling
09:15 – Why generative video?
11:19 – Understanding physics through AI video
12:20 – The challenge of creating generative video
15:36 – Data set selection and training
17:50 – Structural consistency and 3D objects
19:50 – Incorporating LoRAs
21:24 – How should creators think about these tools?
23:46 – Open challenges in video generation
25:42 – Infrastructure challenges and future research
Find Robin on Twitter: https://twitter.com/robrombach
Find Andreas on Twitter: https://twitter.com/andi_blatt
Find Anjney on Twitter: https://twitter.com/anjneymidha
The CFI Podcast discusses the most important ideas within technology with the people building it. Each episode aims to put listeners ahead of the curve, covering topics like AI, energy, genomics, space, and more.