RLG Meta Access Summit

RLG Meta Access Summit
Mountain View, Calif., 1 July 1997

A Meta Access Summit Meeting was called by the Research Libraries Group (RLG) at its headquarters in Mountain View, Calif. on 1 July 1997. Attending were about two dozen librarians, archivists, and others, representing such groups as the Consortium for the Computer Interchange of Museum Information (CIMI), the Canadian Heritage Information Network (CHIN), the Getty Information Institute, and the Visual Resources Association (VRA). Also attending were upwards of ten RLG staff members. Discussion centered around the "intersecting information landscape" and the need for a lingua franca for description and access, with a goal being establishing action items for RLG to pursue. Further information on metadata, etc. can be found at the Dublin Core Metadata Element Set Home Page at http://www.oclc.org:5046/research/dublin_core/ which includes links to DLib magazine and other sources.

Stu Weibel of OCLC started with a description of the development of the Dublin Core (DC). The name comes from the first workshop of librarians, networkers, and content experts, held in Dublin, Ohio and sponsored by OCLC and NCSA. From the March 1995 gathering came the first delineation of 13 elements, all of which are repeatable, optional, and extensible. The group also coined the phrase "document-like objects" or DLOs as a catchall for what is to be described. The consensus from the first workshop was that the 13 elements were sufficiently useful for future progress on metadata.

The second workshop, held in Warwick, England a year later, discussed a syntax to deploy DC and resulted in the Warwick Framework. The framework is still conceptual but research continues on how metadata will be interoperable and linked into meaningful packages. The conference provided a good start on the syntax. The web and HTML are considered not ideal but necessary practical and strategic applications for dealing with metadata.

The third workshop, again held in Dublin at OCLC with cosponsorship by the Coalition for Networked Information (CNI), dealt with image metadata. Two elements were added to the core, specifically aimed at images.

DC4 was held in Canberra, Australia in March 1997. Discussion centered on the need for qualifiers and subcategories. Twelve countries were represented by librarians, and technical and content experts. The spectrum on metadata stretches from the minimalists (those who would leave the categories most flexible) to the structuralists (those who feel the need for clear guidelines on qualifiers and subcategories). Added to the significant elements for metadata include scheme, language, and subelements. "Language" is the language of the metadata, not of the resource; this is especially important in multilingual settings and is to be added to the next version of HTML. Qualifiers are intended to refine, not extend, the element. The WWW Consortium (W3C) sent three representatives to Canberra. The PICS Next Generation Working Group is working to expand PICS (Platform for Internet Content Selection) from merely a ratings mechanism to become a generalized architecture for metadata. HTML can embed metadata and the new version of HTML ("Cougar") will be more friendly to metadata but some categories (e.g., scheme) may not make it into the specs. Subelements, however, can easily be embedded in HTML.

The Internet Engineering Task Force (IETF) has issued four RFCs (requests for comment) on the deliverables from Canberra. They stretch from minimal to structural, and are available for comment on the web (of course).

DC5 is in the planning stages and will be held in October 1997 in Helsinki. It will be oriented toward doing DC. Weibel is confident that PICS (name may change) will be effectively expanded and stated that it will be implemented in web browsers and that it will be able to handle a wide variety of metadata. He described "label bureaus" as being like the bibliographic utilities in providing the infrastructure to parse, distribute and use PICS as an underlying architecture for metadata. XML (extensible markup language) will be the underlying language. Another bit of language: Resource Attribute File (RAT File) which is something like an SGML DTD.

Ricky Erway of RLG then discussed strategic web applications of metadata. Semantic drift leads to a breakdown of commonality. The basic question: is metadata merely for resource discovery (i.e., finding a document-like object) or must it make sense by itself? The DC categories are currently defined for the web existence of the DLO, with the most problematic elements being date and source. That is, for preexisting documents, the date is for its "webness" and all of the data about the original are included in source. For many resources, the original information is more relevant to users. Commingling elements for the web and original versions may result in confusion. Should the metadata refer mainly to webness and one must consult the resource to find details about its origin? Crosswalks between metalanguages are seen more as a means for cross indexing than for data transfer.

Following Weibel's and Erway's comments, there was a general discussion of Dublin Core and metadata in general, followed by lunch discussion groups. From the discussion:
* DC is not trying to provided complexity.
* Metadata is how we get the user to the resource.
* DC is the surface layer of metadata, a sort of lowest common denominator, some of which will translate to the common ground, some of which will not.
* DC probably works best on electronic resources and focusing there may be a good starting point.
* Traditions within scholarly communities vary and must be respected in data and framework.
* DC has "Grail-like characteristics."
* Granularity is important.
* General search agents will need to be responsive to metadata but, just like libraries, selected universes of resources will continue to be important.
* Ricky Erway was asked to describe her digital tourist metaphor: DC is like a European phrase book (e.g., it finds the W.C. or cathedral, menu items) but does not provide detail or complexity.
* The University of Michigan's registry project is using an SGML DTD which is much like DC.
* Library catalogs generally have not mixed data at different levels, i.e., description of book and other material, table of contents data, indexes, and full text. Web resources include all levels; sorting for relevance and granularity is difficult. Metadata should provide added value over web browsers.
* DC is Z39.50 tag set G.
* Resource discovery may mean location, navigation, browsing, consolidation, and iteration, and may reflect points of view.

Each of the breakout groups was asked to look at specific issues, identify impediments to solutions, outline desirable outcomes, and identify action items (both in progress and those needing to be implemented).

Group 1 centered on standards. Metadata will be a rough consensus rather than being like traditional library, etc. standards. There will be no Joint Steering Committee. DC will interact with frameworks and RAT files. Metadata will evolve and crosswalks will be developed (also evolving) as necessary.

Group 2 looked at possible DC applications. DC will provide a common denominator if not a lingua franca. Date and object/source are significant problems with the categories at this point; they will be discussed at DC5 in Helsinki. An application protocol (or compendium of practice) might prove an important tool for the cultural heritage community. Actions: build some records; resolve some questions (principally date, source/object); pass information on to other groups (e.g., CC:DA Metadata Task Force).

Group 3 discussed DC development. Stable semantics are a necessity before application can spread widely. Complexity and interoperability must be clarified. A profile model (tag set model) would be helpful.

The fourth group looked at crosswalks which can be used for either indexing and retrieval or for data transfer. A common metadata set is less threatening to communities of scholars and data providers; DC is relatively neutral. An effective crosswalk can provide a desired level of transparency for the user. That is, the user can present a request in his or her home language, crosswalks can enable searching a variety of places, and provide a response in the home language. Action items: registry of authentic crosswalks (may not be possible); feedback and maintenance.

Closing discussion:
* DC will evolve. Some categories will improve with tightening definitions.
* What should be in a user guide to DC?
* The "meta2" list is currently high-traffic but may split into a list for practitioners and one for developers.
* DC deployment must move from the "do it" gospel to practical applications. Mostly so far DC has come from data transfer rather than creation in DC.
* Evaluation is necessary. VADS is a valuable early step (Visual Arts, Museums and Cultural Heritage Metadata recommendations (part 4) at: http://vads.ahds.ac.uk/Metadata1.html)
A DC tag set will be derived from APIS (papyrological project at Michigan, Duke, etc.).
* Weibel described a German preprint medical database as one of the best DC applications to date.
* A task group was established to write a position paper for Helsinki regarding categories like date, object, source, coverage, and relation which are of special importance to the cultural heritage community.
* When mapping is not equivalent, is DC enough to get started?
* DC is not a substitute for a community's database structure.

Notes compiled by:
Sherman Clarke
NYU Libraries
sherman.clarke@nyu.edu

N.B. the official report of the RLG Meta Access Summit is on the RLG home page at http://www.rlg.org/meta9707.html.

RLG Meta Access Summit Mountain View, Calif., 1 July 1997

RLG Meta Access Summit
Mountain View, Calif., 1 July 1997