This blog post summarizes a recent presentation at the Association of Moving Image Archivists' AMIA 2025 conference in Baltimore.
Audiovisual archives hold incredible cultural and historical value, but a significant challenge has always been discoverability. Traditional methods of creating descriptive metadata for video content are time-consuming, manually intensive, and often result in sparse data. This leaves thousands of hours of rich video content almost invisible to researchers.
Our goal is to automatically generate useful, time-coded, and highly accurate metadata to unlock the immense value hidden within these large video collections. We believe that emerging AI technologies can support a scalable, cost-effective solution for describing archival video collections that currently lack sufficient metadata.AI: A Practical Tool, Not a Magic Wand
We view AI as a tool to augment professional archival practice, not replace it. The value of archival description remains rooted in professional judgment—contextual interpretation, ethical considerations, and cultural sensitivity.
However, AI can bridge the gap created by limited staff resources by:
- Automating the creation of baseline descriptive metadata.
- Improving search and discovery across large holdings.
- Reducing manual labor on repetitive tasks.
- Supporting accessibility improvements.
It's essential to frame this work within responsible and ethical practice:
- Transparency: Be clear about when and how AI-generated metadata is created.
- Accuracy and Human Review: Archivists and subject-matter experts must evaluate AI outputs and refine them, as even the best tools produce errors.
- Bias: Be intentional about bias, as AI systems inherit the biases of their training data. Archivists should consider when not to use automated tools, especially with sensitive material.
Aviary's 3 different approaches to AI description
The Aviary team has investigated and implemented three complementary approaches to utilizing AI for video collection description:
1. Named Entity Recognition (NER) with spaCy
- Method: This approach runs on an existing transcript, using the spaCy Python library to analyze text, identify entities (Person, Place, Date, Org, Event), and classify them. It then automatically matches these entities to Wikidata entries to extract authoritative entity names, descriptions, images, dates, and URLs.
- Benefit: It transforms unstructured text into structured, linkable, and contextually rich metadata. It's fast, efficient, and great for fine granularity.
- Best Suited For: Video with high-quality transcripts that mention many named people, places, events, and dates.
2. Large Language Models (LLMs) Analyzing Audio with AssemblyAI
- Method: This uses AssemblyAI to analyze video with audio content, automatically dividing the content into logical segments and providing a summary for each—or summarizing the entire video.
- Benefit: It does not require an existing transcript, produces high-quality output easily consumed by researchers, and enhances searchability and video navigation. It automates the time-consuming manual task of creating a table of contents.
- Best Suited For: Video with varied content and many different segments where an existing transcript may not be available.
3. LLMs Analyzing Video with Cloudglue
- Method: Cloudglue preprocesses the video's visuals, audio, and on-screen text into structured, machine-readable data. An LLM then analyzes this data to generate file-level descriptive metadata, including Summary, Keywords, Table of Contents, Speakers, Languages, and Sensitive Content.
- Benefit: This is the base layer of metadata creation and a crucial starting point. Importantly, it does not require video with speech or sound, making it ideal for silent or visually-focused archival footage.
- Best Suited For: Video with more singular subject matter or fewer segments, where visual analysis is key.
Key Takeaways
The investigation into these AI tools has led to three core conclusions:
- AI tools are practical and scalable solutions for under-described videos. They help eliminate backlogs and surface previously hidden collections.
- Different approaches to AI description address complementary aspects of audiovisual content. Archivists can select the best tool based on the material's characteristics (e.g., presence of speech, visual complexity, need for high-granularity metadata).
- Professional cataloging remains essential. AI is a powerful assistant, but the critical work of contextual interpretation, quality control, and long-term metadata stewardship remains firmly in the hands of archivists.
Ready to transform your under-described video collections and break through your metadata backlog? Explore these AI-powered description services in the Aviary platform and start unlocking the value of your archives. To learn more about implementing NER, AssemblyAI, and Cloudglue workflows, or to schedule a demonstration, visit aviaryplatform.com today.