ICT and Information

Introduction to Metadata

Metadata and Metadata Format and Standards

1. Definition of Metadata:

The decade of the 1990s saw the development of a proliferation of metadata element sets for resource description. Metadata is “data about data-. It is data for the purposes of cataloging, searching, archiving, electronic discovery. displaying, and so on. The key indication of the direction of the WWW on metadata comes from the inventor of the WWW, Tim Berners-Lee. “Metadata is machine understandable information about web resources or other things” but “metadata is data.

The familiar library catalogue record could be described as metadata in that the catalogue record is ‘data about data’. Similarly, database records from abstracting and indexing services are metadata (with a different variation on location data). However, the term metadata is increasingly being used in the information world to specify records which refer to digital resources available across a network. By this definition a metadata record refers to another piece of information capable of existing in a separate physical form from the metadata record itself. Metadata also differs from traditional catalogue data in that the location information is held within the record in such a way to allow direct document delivery from appropriate application software, in other words, the record may well contain detailed access information and the network address(es). Metadata

NISO’s Understanding Metadata”, the National Information Standards Organization. a non-profit association accredited by the American National Standards Institute. defines metadata as “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.

A User Guide for Simple Dublin Core provides the following definition:

Metadata describes an information resource. The term “Meta” comes from a Greek word that denotes something of a higher or more fundamental nature. Metadata. then. is data about data. It is the Internet-age term for information that librarians traditionally have put into catalogs and it most commonly refers to descriptive information about Web resources.

Caplan points out the benefits of using a ‘new’ term to describe internet resource records. There is no residual meaning attached to the term ‘metadata’ as opposed to the traditional connotations of ‘catalogue record’. Coining a new term emphasizes the differences inherent in records describing network resources and indicates that these records will be used outside the library cataloguing tradition (Caplan, 1995).

2. Types and Function of Metadata:

In this section various types of metadata and their functions are discussed.

2.1 Types of Metadata:

There are three main types of metadata:

  • Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract. author, and keywords.
  • Structural metadata indicates how compound objects are put together. for example, how pages are ordered to form chapters.
  • Administrative metadata provides information to help manage a resource. such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of administrative data; two that sometimes are listed as separate metadata types are:

– Rights management metadata, which deals with intellectual property rights. and

– Preservation metadata, which contains information needed to archive and preserve a resource.

3. Function of Metadata:

The various functions of the metadata are as follows:

  • Resource Discovery: Metadata discover resources allowing resources to be found by relevant criteria; identifying resources; bringing similar resources together; distinguishing dissimilar resources; and giving location information.
  • Organizing e-resources: Metadata organize e-resources organizing links to resources based on audience or topic and building these pages dynamically from metadata stored in databases.
  • Facilitating Interoperability: Facilitate interoperability using defined metadata schemes, shared transfer protocols. and crosswalks between schemes. Resources across the network can be searched more seamlessly.
  • Digital Identification: Metadata help in digital identification using elements for standard numbers, e.g. ISBN. The location of a digital object may also be given using a file name, URL and/or some persistent identifiers e.g., Persistent URL; Digital Object Identifier and combined metadata act as a set of identifying data, differentiating one object from another for validation purposes.
  • Archiving and Preservation: Metadata are used for archiving and preservation. Digital information is fragile and can be corrupted or altered and it may become unusable as storage technologies change. Format migration and perhaps emulation of current hardware and software platforms are strategies for overcoming these challenges. Metadata is key to ensuring that resources will survive and continue to be accessible into the future. Archiving and preservation require special elements: to track the lineage of a digital object. to detail its physical characteristics, and to document its behavior in order to emulate it in future technologies [191.

4. Development of Metadata:

The term metadata was originally applied to those bibliographic description activities that were aimed at classifying electronic resources; general understanding of the term has since been broadened to include standardized descriptive information about all kinds of resources. This includes digital and non-digital resources alike. According to Priscilla Caplan “Metadata really is nothing more than data about data; a catalog record is metadata; so is a TEI header, or any other form of description. We could call it cataloging, but for some people that term carries excess baggage, like Anglo-American Cataloging Rules and USMARC. So to some extent this is a “you call it corn, we call it maize” situation, but metadata is a good neutral term that covers all the bases” (Caplan, 1995).

In his “A gentle introduction to metadata-, (Good, 2002) begins with the notion of the humble origins of metadata. He points out that even a simple citation contains basic metadata elements, but argues in favor of a more inclusive approach: An annotated bibliography, for example, also constitutes metadata which is very much like a list of references except that it also includes an extra level of description in addition to the basic metadata for the document.

Metadata relating to a print resource may consist of information such as author. title. year of publication, publisher, and so forth. This information may be organized in a card catalog card, or its electronic iteration. Both types of records are held in the library catalog, electronic or otherwise, which then becomes a repository of metadata about materials that are held by that particular library.

In discussing “traditional cataloging standards”. reference is being made to those tools that have been developed over time for the purpose of cataloging. These include the AACR2 with its editions, MARC21 formats and standards, Library of Congress Subject Headings, Dewey Decimal Classification, Sears Subject Headings, and other cataloging tools that libraries are using to describe and organize knowledge.

AACR2, for example, is an internationally accepted standard for descriptive cataloging. It contains rules for describing and providing access to all types of library materials including books, serials, computer files, maps, music, motion pictures. etc.. through library catalogs. AACR2 is also a standard for structuring catalogs with headings and references to provide links between items with similar or related characteristics.

How are metadata different from the traditional cataloging standards? In looking at this issue from the point of view of purpose or intent of metadata, one arrives at the inevitable conclusion that the differences are not substantial. Both approaches attempt to provide bibliographic description. This can be extended further to include the fundamental mechanisms governing the creation and the structure of metadata. Like traditional cataloging standards, it is governed by the same principles, even v,hen those are applied to a diversity of materials.

J Milstead and S Feldman argue that the term as applied to electronic resources “refers to ‘data’ in the broadest sense — datasets, textual information. Web pages. graphics, music, and anything else that is likely to appear electronically” (Milstead and Feldman, 1999).

What, then, is the reason for the evolution of metadata and what distinguishes them from what came to be known as traditional cataloging standards? The word “Internet” provides a short, albeit incomplete answer. The world of information has mok ed beyond paper and microform as the primary carriers of information. Digital resources have become abundant and with them came the need for classification. With their proliferation came the perception that the available cataloging standards could not be satisfactorily adapted to the demands of these new formats.

This development coincided with a new trend in publishing and bibliographic description. Publishers began to provide libraries that acquired their books with skeletal pre-publication descriptions of their books. As these descriptions became more accurate and complete, the libraries saw an opportunity to use them in their cataloging process. The door was opened to the idea that the library was not the only place where information about materials could be built into a record describing the materials.

As the amount of digital, Internet accessible information grew, librarians came to realize the need to apply some sort of scheme to describe them and that they themselves could not deal with the “workload”. Not everything that came to be considered information could receive the full cataloging treatment used to describe print materials. Prohibitive costs of full cataloging along with perceived inflexibility of existing cataloging standards were two of the key factors leading to the revolutionary changes in information processing; the development of a simplified. flexible standard or standards of cataloging that could accommodate the diversity of electronic formats, and taking cataloging out of the library.

Old and new metadata is based on common practices. Cataloging standards were created and developed as a way to organize information, in order to facilitate the retrieval and access to this information. Standards are the foundation on which all the cataloging and metadata rules are developed. Without these cataloging standards. a single item would be cataloged many times over and each cataloging record is like’) to contain different information. Without the existence of cataloging standards, it would be difficult to imagine how scholars could access information, how libraries could share resources, and how patrons could benefit from the library collection.

Cataloging standards help to organize knowledge and have served scholars and research very well in accessing information relatively quickly and efficiently. Cataloging and metadata standards provide consistency and also exhibit tremendous flexibility. As new publishing formats appear; micro-forms, sound recording. and computer files could serve as examples; new standards are developed to their description. The most recent developments in AACR2-2002 revision were made to reflect the need to catalog electronic resources.

In her introduction to a Special Topic Issue: Integrating Multiple Overlapping Metadata Standards of the Journal of the American Society for Information Science. Zorana Ercegovac provided an overview of the metadata development, in which she led readers through the pre-Internet era and the Internet era (Ercegovac, 1999):

Machine-Readable Cataloging is well-known metadata of the pre-internet era. It was developed at the Library of Congress in the 1960s and in terms of specificity. structure and maturity; it is a highly structured and semantically rich metadata. The purposes were to represent rich bibliographic descriptions and relationships between and among data of heterogeneous library objects and to facilitate sharing of these bibliographic data across local library boundaries. The emphasis is on the entire document; the surrogates are MARC records; the records are produced by human catalogers; MARC does not fare well with regard to management needs (e.g..intellectual property, preservation), or Evaluative needs (e.g., authenticity. user profiles, and grade levels).

In the internet arena, there is a tradition of evolving metadata. Since the early 1990s. distributed repositories on the Internet have had an exponential growth, repositories are contributed by different communities, and there is a need to describe, authenticate. and manage these resources. Therefore, new guidelines and architectures are developed among different communities. Priscilla Caplan described the metadata movement as “a blooming garden, traversed by crosswalks, atop a steep and rocky road” (Caplan, 2000). This metadata “blooming garden” can be viewed from different perspectives i.e. there is no limit for the type or amount of resources that can be described by metadata. For any area that shows a demand for electronic resource discovery and sharing, a metadata standard can be developed or proposed. Today. the resources described by metadata consist of bibliographical objects (e.g.. represented by MARC metadata), archival inventories and registers (e.g.. EAD metadata), geospatial objects (e.g., FGDC metadata), museum and visual resources (e.g., CDWA, VRA. Core, CIMI metadata), educational materials (e.g.. LOM ). software implementation (e.g., CORBA), and many others. The use of these metadata standards is not limited by language or country boundaries. There is no limit to the number of overlapping metadata standards for any type of resources or any subject domain.

5. Metadata for Digital Resources:

Management Metadata is a set of attributes used to describe an object. In reviewing the library and information science literature of the past few years, there is no shortage of views of the significant role of metadata in meeting the most pressing needs and challenges of digital resource management. Metadata enables users to find the resources they require; therefore it is an important component of any digital repository. Authors. librarians and information scientists use metadata to classify content for organization and retrieval.

Metadata creation and management have become a very complex mix of manual and automatic processes and layers created by many different functions and individuals at different points in the life of an information object. Figure 1 illustrates the different phases through which information objects typically move during their life in a digital environment. As they move through each phase, the objects acquire layers of metadata that can be associated with the objects in several ways.

The Life Cycle of Objects Contained in a Digital Information System

a. Creation and multi-versioning: Objects enter a digital information system by being created digitally or by being converted into digital format. Multiple versions of the same object may be created for preservation, research, dissemination, or even product development purposes. Some administrative and descriptive metadata may be included by the creator.

b. Organization: Objects are automatically or manually organized into the structure of the digital information system and additional metadata for those objects may be created through registration, cataloging, and indexing processes.

c. Searching and retrieval: Stored and distributed objects are subject to search and retrieval by users. The computer system creates metadata that tracks retrieval algorithms, user transactions, and system effectiveness in storage and retrieval.

d. Utilization: Retrieved objects are utilized, reproduced. and modified. Metadata related to user annotations, rights tracking, and version control may be created.

e. Preservation and disposition: Information objects undergo processes such as refreshing, migration, and integrity checking to ensure their continued availability. Information objects that are inactive or no longer necessary may be discarded. Metadata may document both preservation and disposition activities. Other little-known facts about metadata are as follows:

1. Metadata does not have to be digital. Cultural heritage and information professionals have been creating metadata for as long as they have been managing collections. Increasingly, such metadata are being incorporated into digital information systems.

2. Metadata relates to more than the description of an object. While museum, archives, and library professionals may be most familiar with the term in association with description or cataloging, metadata can also indicate the context, management, processing, preservation and use of the resources being described.

3. Metadata can come from a variety of sources. It can be supplied by a human (a creator, information professional, or user). created automatically by a computer, or inferred through a relationship to another resource such as a hyperlink.

4. Metadata continue to accrue during the life of an information object or system. Metadata is created, modified, and sometimes even disposed of at many points during the life of a resource.

5. One information object’s metadata can simultaneously be another information object’s data (Gilland-Swetland, 2000).

6. Metadata Format and Standards:

However the term metadata is increasingly being used in the information world to specify records which refer to digital resources available across a network. By this definition, a metadata record refers to another piece of information capable of existing in a separate physical form from the metadata record itself. Metadata also differs from traditional catalogue data in that the location information is held within the record in such a way to allow direct document delivery from appropriate application software. in other words, the record may well contain detailed access information and the network addresses. There is a great diversity of perspectives on various aspects of metadata issues. For instance, librarians have used machine-readable cataloguing since the 1960’s to identify, describe and provide access to their collections. However. what worked well for libraries may not work in other environments. Similarly. the basic metadata required for describing an image or work of art or non-text objects will bear a strong resemblance to the metadata that describes traditional print documents. However, some significantly different extra elements will be required for a complete description of non-text images and multi-media resources. In light of this. some formats of metadata have been developed specifically for use in certain fields of study or type of information source.

Metadata standards come from various professional community efforts to support many needs in the digital environment. The literature reveals that different communities view metadata in significantly different contexts. No single metadata standard can be expected to accommodate the needs of all communities. Although some projects, such as Dublin Core have tried to develop a coherent set of metadata schemes that can work for wide range of communities, they have not yet provided a complete description or solution for all types of digital information resources.

A metadata element set has two basic components:

1. Semantics – definitions of the meanings of the elements and their refinements.

2. Content – declarations or instructions of what and how values should be assigned to the elements.

For each element defined, a metadata standard usually provides content rules for how content should be included (for example, how to identify the main title). representation rules for content (for example, capitalization rules or standards for representing time), and allowable content values (for example, whether values must be taken from a specified controlled vocabulary or can be author-supplied. derived from the text, or added by metadata creators working without a controlled term list.)

Many metadata standards provided an element set without considering the encoding format in their preliminary versions. For example, Dublin Core, Visual Resource Association Core Categories, Categories for the Description of Works of Art, and the Learning Object Metadata were all published and accepted in terms of their semantics and content long before the specific encoding methods for their data models were published. On the other hand, a few other metadata standards, like the Encoded Archival Description Document Type Definition, provided an encoded element set from the beginning. The EAD DTD, a standard for encoding archival finding aids currently using XML, was published a decade ago with an SGML DTD.

Original Reference Article:

  • Patra, C. (2010). Digital repository in ceramics A Metadata study.

Md. Ashikuzzaman

Work at North South University Library, Bangladesh.

Leave a Reply

Your email address will not be published. Required fields are marked *