Tencent AI Labs – Animate-A-Story: Storytelling with Video Generation
A groundbreaking approach to create captivating storytelling videos using existing video clips
Creating engaging and visually appealing storytelling videos often involves complex and resource-intensive processes such as live-action filming or computer-generated animation. A recent research paper titled “Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation” from a team of Researchers from Tencent AI Labs introduces an innovative framework that revolutionizes this process. The framework leverages the abundance of existing video clips to synthesize coherent and customized storytelling videos, eliminating the need for extensive filming or animation rendering.
The Animate-A-Story framework consists of two key modules: Motion Structure Retrieval and Structure-Guided Text-to-Video Synthesis. These modules work in tandem to generate storytelling videos aligned with desired scenes, motions, and appearances.
Motion Structure Retrieval
The Motion Structure Retrieval module utilizes an off-the-shelf video retrieval system to provide video candidates based on query texts describing desired scene or motion contexts. By estimating the motion structure of the retrieved videos, the framework extracts valuable guidance for video synthesis.
Structure-Guided Text-to-Video Synthesis
The Structure-Guided Text-to-Video Synthesis module generates plot-aligned videos under the guidance of motion structure and text prompts. It employs a controllable video generation model that offers flexible controls over video structure and character appearances. By following the provided structural guidance and appearance instructions, the framework synthesizes visually consistent videos across multiple clips. Moreover, an effective concept personalization approach allows users to specify desired character identities through text prompts, enhancing visual realism and coherence.
Retrieval Augmented Video Generation
This retrieval-augmented video generation approach overcomes the limitations of traditional text-to-video generation techniques. Previous methods often struggle to generate proper motions, layouts, and compositions necessary for storytelling and film production. In contrast, the Animate-A-Story framework takes advantage of existing video assets to provide better control over layout, composition, and character appearance, resulting in superior video generation performance.
The researchers conducted extensive experiments to evaluate the effectiveness of their approach. The retrieval-enhanced text-to-video generation model outperformed existing baselines, demonstrating remarkable video generation performance. Furthermore, the proposed personalization method showcased notable advantages over competitors in terms of concept customization.
The Animate-A-Story framework offers a more efficient and accessible way for content creators to produce high-quality animated videos. By utilizing existing videos, providing layout control, and allowing personalized character appearances, this framework presents a novel and powerful video-making tool with remarkable convenience and potential for practical applications.
Paper : https://doi.org/10.48550/arXiv.2307.06940