by Wayne Pender, McGill University, 2012 Joe Ann Clifton Student Award Winner (Published September 1, 2012)
Television news libraries have struggled with cataloguing, organizing and storing visual materials for efficient search and retrieval for years. This task has been complicated by the emergence of digital video and the exponential growth in the holdings of television news libraries. In parallel, the ability for non-professionals to shoot, edit and publish videos of increasing production value and complexity on the web has flooded the Internet with videos that appeal to a wide audience and subject interest. The present survey looks at the metadata protocols and practices in place for an internal audience in professional operations and on display for the general public on the Internet. The study finds that the lack of a common metadata schema can make much of the material inaccessible. Literature pertaining to this area is reviewed and future direction is discussed.
Keywords: metadata, video, XML, RDF, MXF, television, news, YouTube
Searching and retrieving visual content is problematic. The search for specific moving images on film and video has long been a task for television news professionals, but now with wide spread availability of video resources on the Internet it has become a task for anyone. One of the problems of searching for video is the complexity of the medium. Where the description of a single photo may take many lines of metadata script, the description of a minute of video contains hundreds of photos that will require an exponential amount of metadata. Within a second of video there are thirty individual image frames and each second may contain important actions, scenes and distinct characteristics that news producers and information seekers need to retrieve.
The complexity of moving images in terms of description has always made finding video content difficult for television news librarians or producers. In the days of film and analogue video on tape, the basic metadata generation, storage and retrieval of material depended on the time and skills of a media librarian. At first, as in libraries, the librarian needed to taking the time to provide series and item codes, detailed descriptions of visual and audio content, identify key names, places and contexts, and finally placing that information in catalogues of some sort to allow the librarian to quickly retrieve the material.
In the digital era much of the basic spatial and temporal metadata can be generated by the recording device and embedded in the video file. However, generation of accurate and detailed description of rich and complex content, and then retrieving the media is still a problem. News operations demand rapid retrieval of news items and this requires a robust metadata system for search and retrieval. This paper will explore the requirements and solutions for the provision of metadata for video in the television news context and its wider implications for search and retrieval on the web. Literature surrounding the evolution of metadata schema for video, questions of interoperability, effective retrieval and the future of video metadata will inform the discussion.
The problems of preservation and access
It is now possible for anyone to shoot, edit and publish video productions on the web. In the past, only trained television and film professionals working for media organizations had access to expensive equipment and specialized training. The use of video for educational, communication and individual expression has exploded because of the increased usage of mobile phones with video recording capability. Today, the availability of these tools has resulted in vast numbers of videos being made available on such websites as YouTube and on individual blogs, social networking sites and personal websites. The importance of metadata in searching and retrieving materials for professionals and the public will only intensify as the medium becomes more popular as an information sharing tool.
The consumer side of video production in terms of metadata is an important consideration for two reasons. First, because the enormity of online content presents a magnification of the search and retrieval problem that television news operations have to deal with. Second, increasingly, the Internet is replacing traditional means of transmission and storage of video content for users. The Internet has caused considerable overlap in access between professional and amateur video. Television news regularly solicits video from the public and people share links to video professional content through social networks. Metadata is the key to finding video content of any kind on the Internet. Television news operations now must be prepared to live stream coverage over the Internet, while producing and packaging specialized versions to be available for the traditional newscast as well as providing video highlights for on demand viewing online.
If the case has been made that metadata is crucial for search and retrieval of video materials, it then must be decided what metadata schema and structure to use. The factors that make it difficult to determine the type of metadata to use for video are in some ways the same considerations that make choosing metadata schemes for textual content problematic. One of the key considerations is interoperability. Interoperability is the ability for one metadata system to be compatible with others and operate with a common, well-known interface. If the goal of interoperability is realized, it will ultimately mean that availability and access will be maximized. In the current context of online video repositories and archives, interoperable technical and semantic metadata would mean that search and retrieval of material placed on the Internet would not be restricted by the lack of accessible metadata.
Another key consideration is the provision of metadata elements that allow the encoder to adequately describe moving images. Often, visual information is independent of any kind of textual reference. For example, video recording tracks are designed with at least two accompanying audio tracks; however, it may be the case that there has been no discernable audio recorded that can be transcribed and used as a textual reference. In this case, it is necessary that encoders have the latitude to create detailed descriptions of the visual content on its own without any reference to words. The semantics must be able to convey the meaning of the images in the most accessible way.
Identifying and implementing an interoperable, extensible and effective metadata schema is problematic given the number of metadata schemes available and the offsetting positive and negative aspects of each of them. The advantages and disadvantages of various potential metadata schemes for video archiving will be discussed in this paper following a brief review of the literature that was consulted.
Public broadcasting in the United States has undertaken an effort to create a repository to preserve digital media assets and to take the lead in promoting common technical and content metadata across the broadcast industry. A variation of DC was used to create “PB Core”, which was contained within a Metadata Encoding and Transmission Standard (METS) rights wrapper and ultimately a Material Exchange Format (MXF) wrapper (that works well with XML) to promote interoperability. The Public Broadcasting working group made efforts to reach out for collaboration with commercial television networks and concedes that without their support it is unlikely that a metadata standard can be achieved. (Rubin, 2009).
The Internet has increased the pressure on commercial broadcasters to make their productions as accessible as non-professional entities. According to Kallinikos and Mariátegui (2011), interoperability and access are issues that media organizations have been for forced to deal with. “Findability and accessibility are key prerequisites for operating and competing in the digital marketplace” (p. 282). In an empirical study of the British Broadcasting Corporation (BBC) and the corporate effort to streamline operations and accessibility to its productions, Kallinikos and Mariátegui found that metadata is like a “passport” that allows performance of different “tasks and operations” by how it is “produced, delivered and ultimately consumed” (p. 289).
The nature of video content sometimes makes it difficult to describe in syntactically. If a video clip is accompanied by an audio track that helps describe the visual elements the task of encoding metadata may be made easier by utilizing the transcript of the material. However, the strength of visual material is the emotion or deep meaning that can be conveyed. The saying of ‘a picture is worth a thousand words’ is thought of as a cliché; however, in the context of metadata where semantics are important signifiers for searchers, relevant semantics are necessary content that convey the degree of meaning of the visual element.
The encoding of semantics requires good metadata tools and a skilled professional to interpret video content. The quality of the human input and the technical capability of the metadata schema are important factors in consideration of problems with metadata for video (p. 290-291). Efficiencies can be achieved when the metadata schema used for internal use to search and retrieve video matches the metadata that is used to search and retrieve on the Internet. In this way the professional and the public spheres align and highlight the need for an integrated approach for preserving and accessing video material universally.
Trade journals raise the issue of the lack of a universal metadata schema and the negative consequences it can have for search and retrieval of video. Miller writes that searching for video is handicapped by the lack of text-based content for keyword searching and it means that users are relying on the keywords used in metadata, if there is any, to find visual items. Many organizations have realized the need for Digital Asset Management (DAM), but often there is not the time, budget or expertise available within an organization to effectively create and maintain metadata (Miller, 2006).
A comparison of metadata schemas for video by Hunter and Armstrong looks at various schemas including Dublin Core (DC), Resource Description Framework (RDF) and variations of Extensible Markup Language (XML) that all satisfy Moving Picture Experts Group (MPEG-7) Description Definition Language (DDL) requirements. First, a proposed schema that utilizes the basic fifteen DC elements as an upper level and qualifiers at a lower level to provide granularity is considered. However, the loss of “semantic interoperability” by using element qualifiers is pointed out as a disadvantage to the schema (Hunter & Armstrong, 1999, p. 354). Next, metadata schema requirements are listed and compared (p. 356). Post comparison, the authors find that none of the schemas are well suited to metadata for video and posit that a “hybrid” solution consisting of XML, supported by RDF and Schema for Object Oriented (SOX) XML is necessary. This solution would provide “object-oriented, semantical concepts of RDF but express them within an easily understood, human-readable XML schema” (p. 372).
Where the Hunter & Armstrong article provides a foundational survey of the issues surrounding metadata for video, it is dated. A current study and presentation of metadata solutions is offered by Sikos (2011), in which he proposes that the impending implementation of (X)HTML-5 will provide better conditions for search and retrieval by using a form of metadata called “microdata” in combination with RDF, DC and other specific metadata schemes. Sikos posits that (X)HTML-5 support of virtually all metadata schemes, its own embedded data and the fact that Google, Yahoo! and YouTube all index RDF means that this Internet encoding will go a long way to making video more accessible online. However, he then goes on to call for standardization of metadata for video and suggests that social media and video sharing portals will have to drop the embedded metadata they upload so that it does not conflict with (X)HTML so that the full potential of metadata through (X)HTML will be realized.
The use of an XML based MEG-7 schema is touted by Carreras, Tous, Rodriguez, Delgado, Cordara, Francini & Gibellino (2010) for its “interoperability and platform independence” and its “flexible language” that helps “narrow the semantic gap” by extracting spatial and temporal elements through use of “video analysis algorithms” (p. 48-52). The MPEG Group outlines the objectives of MPEG-7 as a “Multimedia Content Description Interface” that “supports some degree of interpretation of the information meaning”. Lower levels of description render basic descriptors such as colour, size and shape; whereas, higher levels of description allow for semantic meaning to be recorded for greater granularity, thereby enhancing search and retrieval functionality (MPEG Group, 1.2 MPEG-7 Objectives, 2004). Acting as a framework, MPEG-21 “[defines] the technology needed to support Users to exchange, access, consume, trade and otherwise manipulate Digital Items in an efficient, transparent and interoperable way (Bormans & Hill, 2002).
Material Exchange Format (MXF) metadata standard and accompanying Descriptive Metadata Scheme (DMS-1) are the subject of a Canadian Broadcasting Corporation (CBC) – Radio-Canada publication that explains how these tools can be used to create metadata that can aid in the search and retrieval of news video in an operational context to reduce the need for expending resources on re-shooting material that is already available in the library. Integration with a news writing software package can help librarians include important contextual information into the MXF DMS-1 system to give full, rich descriptions of visual material for search, retrieval and reuse (Kozyra, 2007).
Other sources regarding the technical and functional capabilities of MXF as a metadata solution help to explicate the specifics of this format. “The ultimate goal of MXF is to interchange Essence (picture, sound and data)”. The use of “essence containers” to encode basic information is supplemented by other packages of metadata that combine to render a complete metadata solution (Ferreira, 2010).
The convergence of professional and amateur video metadata is indicated by the common reference to the use of XML as a interface for MXF as it has already been referred to with regards to MPEG-7. The document “XML Schema for MXF Metadata” points to the use of XML as a universal tool for developers to make the metadata more accessible (MOG Solutions, 2005).
The problem of search and retrieval is complicated by the inability of many of these heterogeneous schemas to talk to each other. And not all of the possible schemas and standards have been mentioned here; notably, Categories for the Description of Works of Art (CDWA), Cataloguing Cultural Objects (CCO), Visual Resources Association Core Categories (VRA Core), Metadata Encoding Transmission Standard (METS) and Metadata Object Description Schema (MODS). Also, it should be mentioned that Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is important in the XML context as a search and retrieval convention.
A key tool that can be used to achieve interoperability and hence successful searches for video material is Z39.50, which can help deal with the heterogeneity of metadata schemas. Although Z39.50 holds promise to be the binding and common standard to improve searches for video material there are some misgivings about it. “Current digital library query representations and protocols, such as Z39.50 , deal entirely with textual metadata. This is not sufficient for multimedia digital libraries …” (Bakker, Huang, Lew, Sebe & Zhou, 2003, p. 94).
Yet, there is opinion that Z39.50 can work with other schema and standards to be an effective search and retrieval tool for finding moving image materials. According to Poul Henrik Jorgensen of the Danish Bibliographic Centre, Z39.50 can support XML syntax and namespaces and navigate RDF structures (Jorgensen, 2000, PowerPoint presentation). Despite the promise of Z39.50 it does not seem to be a current concern. The author of the present paper could not find any current information regarding the development of new profiles. The most recent listed profile on the Z39.50 website is 2004. This indicates that the protocol is not currently being maintained and developed, which leads to an assumption that the standard is being overtaken by others, perhaps XML schemas.
The dichotomy between the professional and non-professional is still a reality in the world of dealing with video libraries and archives. And it does not appear that these two will come together anytime soon in terms of a universal metadata scheme. The professional side is working with the MXF format, in large part because of its compatibility with newsroom and production content management and video editing systems, which widely used in the television and film industries. On the non-professional side there is an overwhelming drive to follow the lead of Google/YouTube in the use of XML schemas with PHP (Hypertext Preprocessor) language. Having said that, there are clearly a number of metadata schemes being used to post videos on personal and professional websites and there are no clear standards or regulation.
While it is unlikely that television news or production operations will open up their servers for searching by the public any time soon, it can be seen in the future that if granting access to these resources can become a revenue stream it is possible that the public may be able to download video for a fee. YouTube has increased ad revenues by offering to become partners with people who post videos (Bloomberg, 2011). Vendors of products referred to in videos pay the creator and YouTube to flash ads under the video thereby creating a revenue stream for the service provider and the producer. This in turn, encourages the production of more material.
Television operations are doing this to a certain extent but they are using services like Apple’s iTunes to deliver the service. If the broadcasters streamlined their delivery system with the help of a commonly used metadata strategy they could not only offer finished productions, but they could also offer portions of exclusive non-broadcast material of news and entertainment material that can be aimed at a specific target audience. For example, longer portions of breaking news events that were not significant to be broadcast live, a hostage taking or a rescue at a fire scene for instance.
The exponential growth of video content on the web will not abate anytime soon. Search and retrieval of this material will become more complex and problematic. Increasingly metadata will play a larger role in the maintenance and management of these resources, and universal standards and schemas will be necessary.
Bakker, E. M., Lew, M. S., Huang, T. S., Sebe, N. & Zhou, Xiang (Sean). (2003). Image and Video Retrieval: Second international conference, CIVR 2003 Urbana-Champaign, IL. USA July 2003 Proceedings. Springer-Verlag : Berlin Heidelberg. Retrieved from Google Books.
Bormans, J. & Hill. K. (2002). MPEG-21 Overview v.5. Coding of Moving Pictures and Audio. Retrieved from http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm#_Toc23297968
Carreras, A., Tous, R., Rodriguez, E., Delgado, J., Cordara, G., Francini, G. & Gibellino, D. (2010). A MPEG based architecture for generic distributed multimedia scenarios. Journal of Digital Information Management. 8(1), 47-53.
Ferreira, P. (2010). MXF – a technical overview. EBU Technical Review, Q3(2010). Retrieved from http://tech.ebu.ch/publications/trev_2010-Q3_MXF-1
Hunter, J. & Armstrong, A. (1999). A comparison of schemas for video metadata presentation. Computer Networks. 31(11-16), 1431-1451.
Jorgensen, P. H. (2000). ZIG Tutorial January 2000. PowerPoint presentation. Retrieved from www.loc.gov/z3950/agency/zig/meetings/texas/tutorials/xml-rdf.ppt
Kallinikos, J. & Mariátegui, José-Carlos. (2011). Video as Digital Object: Production and Distribution of Video Content in the Internet Media Ecosystem, The Information Society, 27(5). 281-294. doi: http://dx.doi.org/10.1080/01972243.2011.607025
Kozyra, A. (2007). Descriptive metadata migration in the TV news production chain. CBC Technology Review, 4(July 2007). Retrieved from http://www.cbc.radio-canada.ca/technologyreview/pdf/issue4-metadata.pdf
Miller, R. (2006). Finding your own assets: It takes more than just having a DAM. EContent. 29(4), 24-29.
MOG Solutions. (2005). XML Schema for MXF Metadata. Retrieved from http://www.mog-solutions.com/img_upload/PDF/XML%20Schema%20for%20MXF%20Metadata.pdf
Rubin, N. (2009). Preserving digital public television: Not just an archive, but a new attitude to preserving public broadcasting. Library Trends. 57(3), 393-412.
Sikos, L. F. (2011). Advanced (X) HTML5 metadata and semantics for Web 3.0 videos. Journal of Library & Information Technology. 31(4), 247-252.