VP Land
Posts
Academy Clarifies AI Stance, Long-Form Video AI Breakthrough, and Descript's New Editing Features

Academy Clarifies AI Stance, Long-Form Video AI Breakthrough, and Descript's New Editing Features

April 25, 2025

In this week's episode of Denoised, hosts Addy Ghani and Joey Daoud dive into three significant developments affecting the film and media production landscape. The Academy of Motion Picture Arts and Sciences has officially clarified its position on AI use in films, a new AI model called Test-Time Training is making waves with its ability to generate longer, consistent videos, and Descript has teased an innovative chat-based approach to video editing. Let's explore how these developments might impact your creative workflow.

Academy Awards Clarifies Position on AI Use in Films

The Academy Awards has officially released new rules clarifying their stance on generative AI in filmmaking. Rather than imposing restrictions or bans, the Academy has chosen to view AI as a tool that filmmakers are free to use.

The new rules state: "With regard to generative artificial intelligence and other digital tools used in the making of the film, the tools neither help nor harm the chances of achieving a nomination. The Academy and each branch will judge the achievement taking into account the degree to which a human was at the heart of the creative authorship when choosing which movie to award."

This approach represents a practical solution to what could have been a complex regulatory challenge. Earlier discussions had proposed potentially tracking all AI usage in productions—a process that would have been cumbersome and difficult to implement. Instead, the Academy has shifted responsibility to its voting members, allowing them to weigh the creative merit of each work regardless of the tools used to create it.

The focus on "human at the heart of creative authorship" acknowledges that while AI tools might assist in various aspects of production, the Academy still values human creative vision and direction. This aligns with how the industry has traditionally viewed other technological advancements throughout film history.

As the hosts noted, this decision effectively "unlocks" filmmakers who may have been hesitant to employ AI tools for fear of disqualification. For productions facing budget constraints or resource limitations, this clarification provides welcome assurance that utilizing AI tools won't negatively impact award eligibility.

Key takeaways about the Academy's AI stance:

AI is viewed as a tool, not a disqualifying factor
Human creative direction remains central to award consideration
Voting members will assess the degree of human creative input
No requirement for tracking or disclosing AI usage

The Academy also announced other changes, including:

A new award category for casting directors
A stunt coordinator category to be added in 2028
New requirements for Academy members to actually watch all nominated films in categories they vote on

Test-Time Training: A Breakthrough for Long-Form AI Video Generation

Researchers from Stanford, UC San Diego, UC Berkeley, and Nvidia have developed a new video generation model called Test-Time Training (TTT) that addresses one of the most significant challenges in AI video creation: generating longer, temporally consistent videos.

Current video generation models struggle with length because they need to maintain consistency from frame to frame. As videos get longer, the computational demands increase logarithmically as the model must reference back to previous frames to maintain continuity. This is why most commercial video generation tools currently limit outputs to around 20 seconds.

The Test-Time Training approach fundamentally changes how attention mechanisms work in video generation models. While traditional models process attention in small frame-by-frame units (approximately 30 milliseconds at a time in a 24fps video), TTT processes attention in three-second segments. This higher-level abstraction allows the model to store more information about each segment, resulting in greater consistency throughout the video.

The research team demonstrated TTT's capabilities with an impressive one-minute Tom and Jerry-style video generated from a single text prompt. The video features consistent character designs, environmental elements, and even character behaviors that match the original cartoon style—all generated in one continuous process rather than as separate shots edited together.

What makes this development particularly notable:

The ability to generate a full minute of continuous video from a single prompt
Maintaining consistent characters and environments throughout
Preserving character behaviors and stylistic elements
Achieving this with computational efficiency comparable to standard video generation models

The researchers have published both their paper and code, allowing other developers to implement this approach in their own video generation models. This openness could accelerate adoption across the industry.

For content creators, this breakthrough could eventually eliminate the need for current workarounds like creating character sheets and generating individual frames before feeding them into video generation tools. If commercial platforms like Runway, Pika, or Luma integrate similar approaches, we could soon see significantly longer AI-generated videos with much greater consistency.

Descript Teases "Vibe" Chat-Based Video Editing

Descript, already known for its innovative text-based video and audio editing platform, has teased a new feature currently in private beta called "Vibe" video editing. This approach introduces a chat interface within the Descript environment that allows users to edit videos through natural language instructions.

Unlike other text-based video editing tools that exist as standalone products separate from the main editing environment, Descript's implementation integrates directly with their existing editor. This means users can switch between the chat interface, Descript's text-based editing view, and a traditional timeline as needed.

The chat interface leverages Descript's existing AI tools but applies them through conversational prompts. Based on the launch video, users can request edits like cleaning up interviews, adding B-roll that matches the content being discussed, or changing layout designs—all through natural language instructions.

Descript's approach differs from previous attempts at AI video editing by maintaining the connection to the underlying timeline. Unlike systems where you get a single output that you either accept or reject, Descript's chat interface makes changes to your composition that you can then further refine using either text-based editing or traditional timeline controls.

The hosts discussed how Descript primarily targets content creators and marketing teams rather than professional video editors. Its strengths lie in podcast editing, simple camera angle switching, layout adjustments, and trimming—the "low-hanging fruit" of video editing. While professional editors working on complex projects will likely continue using dedicated NLEs like Premiere Pro or DaVinci Resolve, Descript's approach could significantly speed up workflows for social media content and marketing videos.

Some limitations of Descript noted in the discussion:

Limited export settings with compression that might not be ideal for high-quality output
One-directional workflow when exporting to professional NLEs (exports as XML but changes made in other applications can't be brought back)
Less powerful for identifying compelling clips compared to specialized tools like Opus

Despite these limitations, Descript continues to innovate in a space that bridges the gap between professional video editing and accessible content creation tools. The "Vibe" interface represents another step toward making video editing more accessible to users without technical editing knowledge.

Conclusion

This episode of Denoised highlights how AI continues to integrate into filmmaking and content creation workflows while industry institutions adapt accordingly. The Academy's practical approach to AI tools reflects a maturation in how the industry views these technologies—not as threats but as tools that still require human creative direction. Meanwhile, technical advancements like Test-Time Training point toward AI's expanding capabilities, particularly in generating longer, more consistent video content. Finally, Descript's new interface shows how AI might make video editing more conversational and accessible.

For media professionals, these developments represent opportunities to enhance workflows while maintaining creative control. As the technology evolves, the distinction between AI-assisted and traditional production continues to blur, with the focus remaining on the final creative output rather than the tools used to achieve it.

Academy Clarifies AI Stance, Long-Form Video AI Breakthrough, and Descript's New Editing Features

Academy Awards Clarifies Position on AI Use in Films

Test-Time Training: A Breakthrough for Long-Form AI Video Generation

Descript Teases "Vibe" Chat-Based Video Editing

Conclusion

Reply

VP Land