- Google’s DeepMind developed a video-to-audio model that generates sound matching video samples.
- The model encodes video into a compressed representation and refines random noise into audio relevant to the input footage.
- DeepMind used various datasets, including AI-generated annotations, to associate visual events with different sounds.
- The model can generate audio with or without a text prompt but faces challenges like poor audio quality from low-quality videos.
- DeepMind believes the model can complement video generation models to create entirely AI-generated videos with soundtracks and dialogues.
- Other companies like OpenAI and Kuaishou are also developing video generation models with advanced features.
- Runway unveiled its Gen-3 Alpha model trained on videos and images with descriptive captions for immersive transitions and camera movements.
- Pika secured $80 million in funding to enhance its AI video generation and editing platform, offering features like fine-grain editing and sound effects.
Generative AI technologies are expanding into video generation, with companies like DeepMind, OpenAI, Kuaishou, Runway, and Pika developing advanced models. These models aim to create AI-generated videos with soundtracks and immersive features, potentially revolutionizing the media production industry. The evolution of video generation models shows promising advancements in artificial intelligence, although challenges such as audio quality and visual artifacts persist.
元記事: https://www.theregister.com/2024/06/18/google_deepmind_video/