Twelve Labs and FBRC.AI Hackathon Showcases AI's Potential in Media and Entertainment

Twelve Labs and FBRC.AI hosted a 24-hour hackathon in Los Angeles, bringing together multimodal AI talent and the AI community to focus on AI applications in media and entertainment.

The event showcased how multimodal AI is revolutionizing the industry by enabling computers to interpret video by analyzing visual, audio, textual, and other data types together.

Participants had the option to tackle one of four challenges:

1. Video Editing with Johnny Harris: Developing an AI-powered video editing tool to analyze footage and script, automating the process of finding relevant clips and creating montages.

2. Highlight Reel Generation with Drew Binsky: Creating an AI-powered tool to analyze travel footage and generate engaging highlight reels.

3. Sports Press Conference Summarization and Highlight Generation: Developing an AI-powered tool to generate concise summaries and highlight reels from sports press conference videos.

4. AWS-Powered Video Q&A Chatbot with RAG: Developing an AI-powered chatbot to answer questions about movie and TV show trailers using the RAG (Retrieval-Augmented Generation) approach.

The hackathon aimed to expand the possibilities of multimodal AI, building upon Twelve Labs' existing applications in the media and entertainment industry.

Here’s a recap of the top three (actually four) winning teams and their projects.

First Place: ThirteenLabs Smart AI Editor

Challenge

Dylan Ler from Team ThirteenLabs developed the Smart AI Editor, an AI-powered video editor that automates the process of creating highlight reels and summaries from video content. The tool leverages Twelve Labs' multimodal AI APIs to generate transcriptions, video descriptions, object classifications, and metadata, which are then processed by a language model to create a coherent narrative based on user prompts.

How it Works

The input video is processed using Twelve Labs' APIs, generating a transcript, metadata, and timestamps for each segment.
The transcript, metadata, and user prompt are fed into a Google Java 9 1.5 language model, which outputs a new transcript with selected timestamps to create a cohesive story.
A Python movie editor function concatenates the selected video segments based on the output transcript and timestamps.
Additional features include video dubbing, which translates the video into other languages using the speaker's voice, and lip-syncing, which matches the speaker's lip movements to the dubbed audio.
The tool can be used for various applications, such as summarizing press conferences, creating highlight reels from travel footage, and generating engaging video content based on user prompts.

Second Place: AISR (AI Sports Recap)

Prateek Chhikara, Omkar Masur, and Tanmay Rode from Team Top Summarizer developed AISR, an AI-powered tool that creates quick summaries and engaging highlight reels from sports press conference videos.

The tool aims to help fans and media quickly grasp key takeaways and memorable moments without watching entire videos, offering post-game recaps, player/coach interviews, and shareable social media content.

How it Works

AISR is a Streamlit app that integrates advanced AI models like GPT-4 and Twelve Labs' Pegasus API, along with Docker for seamless deployment.
The user inputs a video and a prompt to capture relevant details. The video is processed using Twelve Labs' Pegasus API to generate a transcript with timestamps.
The transcript and user prompt are passed to the OpenAI API, along with prompt engineering, to produce two outputs: a textual summary of the relevant content and a list of relevant timestamps containing snippets of the desired information.
The original video is stitched together using the generated timestamps to create a highlight video, combining segments from different parts of the video to provide a concise and informative summary.
AISR allows users to provide feedback or ask follow-up questions to refine the generated content, and it includes a content-sharing mechanism to easily share the created videos on popular social media platforms.

Third Place (Tied): Samur.ai

Challenge

Soham B, Hritik Agrawal, Aditya Vadalkar, Shauryasikt Jena, and Shubham Maheshwari developed Samur.ai, an AI-powered tool that generates summaries and highlight reels from sports press conference videos in real-time. It aims to provide users with contextually relevant information based on their prompts.

How it Works

The user inputs a video URL and a prompt, which can be either objective (specific questions) or subjective (highlights or memorable quotes).
The tool performs audio transcription, speaker identification, and face recognition using AWS Transcribe and Rekognition services.
The refined transcript, along with the user prompt and conversation memory, is processed using a custom pipeline that determines whether the prompt is objective or subjective.
For objective prompts, the tool performs a Q&A pairing to provide context-specific answers. For subjective prompts, it extrapolates the meaning of terms like "memorable" to retrieve relevant clips.
The generated summary and highlight reel are then presented to the user, with the option to fine-tune the results by providing additional feedback or prompts.
The conversation history is stored in a simple SQL database, allowing users to refine their searches and maintain context throughout the session.

The team emphasized the importance of distinguishing between objective and subjective prompts to generate more accurate and relevant results. They also highlighted the use of face recognition to identify specific speakers in multi-speaker interviews, enabling users to retrieve information from a particular individual.

Third Place: Ai-ssistant Editor

Watch the full demo.

Challenge

Team The Johans, consisting of Andrew Montanez, Harshita Arya, Jorge Ceja, and Jeremy Whitham, secured 3rd place at the Twelve Labs Multimodal AI in Media & Entertainment Hackathon with their Ai-ssistant Editor.

This tool organizes video footage and builds a rough cut based on the user's script and stylistic choices, aiming to save time in the editing process.

How it Works

The user inputs a script and uploads a zip file containing the video footage.
The tool uses Twelve Labs' video understanding AI to analyze the clips, identify key moments, and organize them according to the script.
A custom GPT model, trained on editing techniques and style guides, interprets the script and makes editing decisions, such as determining scene lengths, transitions, and shot selections.
The tool generates a Google Sheet with metadata for each clip, including start and end times, and organizes the clips into "buckets" corresponding to the script's scenes.
Users can refine the edit by making changes to the Google Sheet or export an XML file to be imported into Adobe Premiere for further editing.

The team emphasized the importance of saving time for content creators and editors, allowing them to focus on the creative aspects of the process rather than the time-consuming task of organizing and assembling footage. The Ai-ssistant Editor can be used for various projects, including documentaries, music videos, and even short-form content like Instagram Reels.

The tool's ability to interpret professional editing scripts and make intelligent editing decisions based on the user's input highlights the potential for AI to streamline the post-production workflow while still preserving the human touch in the creative process.

Twelve Labs and FBRC.AI Hackathon Showcases AI's Potential in Media and Entertainment

First Place: ThirteenLabs Smart AI Editor

Second Place: AISR (AI Sports Recap)

Third Place (Tied): Samur.ai

Third Place: Ai-ssistant Editor

About Twelve Labs

Reply

Keep Reading