Towards Identity Conditions for Digital Documents

Allen Renear, David Dubin

Abstract


By "identity conditions" we mean a method for determining whether an object x and an object y are the same object. Identity conditions are arguably an essential feature of any rigorously developed conceptual framework for information modeling. Surprisingly, the concept of same document, which is fundamental to many aspects of library and information science, and to digital libraries in particular, has received little systematic analysis. As a result, not only is the concept of a document itself under-theorized, but progress on a number of important practical problems has been hindered. We review the importance of document identity conditions, demonstrate problems with current approaches, and discuss the general form a solution must take. We then describe our own strategy, based on the BECHAMEL XML Semantics Project--we propose to reduce the relatively elusive and undefined general problem of determining document identity to the comparatively wellunderstood problem of proving logical equivalence in predicate logic. This approach should also enable the determination of semantic relationships other than identity, including similarities and partial identities of various kinds, and will support new strategies in various areas of digital information management, such as preservation, conversion, integrity assurance, retrieval, federation, metadata, and identifiers. Our results complement and extend discussions of the IFLA/FRBR entities (particularly "expression" and "manifestation") taking place in the cataloguing communityt discussions of "resource" taking place in the W3C and Dublin Core communities, and the analysis of similar notions in ontology development for digital libraries, museums and archives. Although our project is still in a preliminary phase, a working inferencing environment, in object-oriented Prolog, has been completed and initial results that confirm our logic-based strategy.

Full Text:

PDF