Sunday, December 4, 2011

Encapsulation: A Digital Preservation Technique

Another MIS paper, this time on a digital preservation technique known as encapsulation.

Introduction 
With the rapid development of both digital software and hardware comes the increasing problem of obsolescence: preserved digital data that will no longer be able to be read by future digital systems (and therefore future users). To resolve this problem a number of digital preservation strategies have been proposed, developed, and implemented—the two main strategies being migration and emulation. As Boudrez notes, “much ink has flown about the advantages and disadvantages of both strategies, but in essence, migration and emulation do not exclude each other” (2005, p.2). Practice has found that, indeed, use of the two strategies complement each other.

However there is a third element that can intertwine with migration and emulation, and is often a core feature of both—encapsulation. Although not a digital preservation strategy on its own, encapsulation works in conjunction with other strategies, and therefore, plays a significant part in digital preservation. This paper will define and describe encapsulation, discuss the context in which encapsulation operates and point to digital curation initiatives that implement it, and identify encapsulation’s advantages and disadvantages.


Encapsulation 
Encapsulation is a technique that “requires metadata to be bundled with, or embedded into, the digital object. The metadata allows the record to be intellectually understood and technologically accessed in the future” (National Archives of Australia, 2004, p.59). This technique aims to counter obsolete file formats by encapsulating or grouping “details of how to interpret the digital bits in the object” through the use of “physical or logical structures called ‘containers’ or ‘wrappers’ to provide a relationship between all information components, such as the digital object and other supporting information” (National Library of Australia, 2001).

On its own, encapsulation cannot preserve digital records: “encapsulation is not a method that prescribes how digital documents will be reconstructed on the screen in future or how accessibility is preserved” (Boudrez, p.4). What it does is ensure metadata about the object’s original relationships is packaged with it, to aid both preservation strategies such as migration or emulation, and future user interpretations (provenance, context etc.). Such metadata is important because “the various components of an electronic record do not form a physical entity, but are stored at separate locations (in a database, a file system or a combination of both) and as different digital objects” (Boudrez, p.4). Encapsulation is one way to track such relationships, convey important information—in the form of Archive Information Packages (AIP) in Open Archival Information System (OAIS) terminology (Lavoie, 2004)—and retain authenticity.

Digital signatures, or pointers to outside storage areas of information, are examples of how information is embedded or ‘bundled’ into the digital object via a ‘wrapper’. Analog instructions that are physically connected to the storage medium are also common. Yet there is no universal encapsulation methodology in use, meaning various repositories have developed their own approaches depending on need and ingest strategies. The jury is also out on what electronic metadata standards should inform the encapsulation process.

There have been attempts to resolve this, namely projects like the Universal Preservation Format (UPF) and the Digital Rosetta Stone (DRS). Encapsulation in practice provides further examples, highlighted by its implementation at the National Archives of Australia, the Public Records Office of Victoria, and the City Archives of Antwerp. A commonality of these examples is the use of OAIS standards to inform what kind of information needs to be embedded, and the use of eXtensible Markup Language (XML) schemas to create the required metadata.


Encapsulation in theory 
In the late 1990s a number of encapsulation models were formulated. The Universal Preservation Format (UPF), was developed in 1997 as a “data file mechanism that utilizes a container or wrapper structure. Its framework incorporates metadata that identifies its contents within a registry of standard data types and serves as the source code for mapping or translating binary composition into accessible or useable forms” (Shepard & MacCarn, year, p.2). Designed to be “independent of the computer applications used to create content, and independent of the operating system from which these applications originated and independent of the physical media upon which that content is stored”, the UPF model was an early recommended practice, arguing that “the Wrapper would be capable of describing and defining the content and its structure” (Shepard & MacCarn, p.2).

Another model put forward was the Digital Rosetta Stone (DRS) project, which took inspiration from the Egyptian Rosetta Stone—a tablet that enabled ancient hieroglyphics to be interpreted in modern times. DRS describes “three processes that are necessary for maintaining long-term access to digital documents in their native formats—knowledge preservation, data recovery, and document reconstruction” (Heminger & Robertson, 1998, p.1). This includes capturing metadata and other information to ensure that “we don’t lose our ability to read our own history” (Heminger & Robertson, p.9).


Encapsulation in practice 
The National Archives of Australia is one institution that uses encapsulation in conjunction with migration and emulation strategies. This process is described in Digital Recordkeeping: Guidelines For Creating, Managing and Preserving Digital Records (2004). Upon receiving data from the producer,

digital records are converted or ‘normalised’ using archival data formats. The archival data formats use XML standard schemas. XML provides a standard syntax to identify parts of a document (known as elements), and a standard way (known as a schema) to describe the rules for how those elements can be linked together in a document. Metadata is encapsulated within the preserved data object, and the whole package is stored in a digital repository. A special viewing tool makes the packages accessible using a form of emulation (p. 63).

Forms of migration, encapsulation via XML schemas, and emulation combine to ensure that digital records are preserved, meet accountability and legislative requirements, and the needs of the community (p. 14).

An early practitioner of encapsulation was the Public Records Office of Victoria (PROV), whose Victorian Electronic Records Strategy (VERS) developed the VERS Long Term Format. This “consists of an object (known as a VERS Encapsulated Object or VEO)” represented in XML and “signed using digital signature technology to ensure authenticity” (PROV, 2000). The XML encoding enables the contents to be inspected in the future by simple text editing software. Encapsulated metadata following the Recordkeeping Metadata Standard for Commonwealth Agencies Version 2.0 and specified in the VERS Metadata Scheme:

  • structures the information contained within the VEO.
  • documents the standards and specifications used in producing the VEO.
  • contains a digital signature and sufficient information to verify the signature.
  • describes the record or folder and its relationship with other records or folders in the recordkeeping system.
  • contains information used to document the history of the record or folder.
  • supports the management of the record or folder (PROV, 2003).

Similarly, the City Archives of Antwerp uses metadata and XML to encapsulate digital objects—drawing on OAIS frameworks that captures AIP’s and encapsulates all information into one container. This is carried out before ingest by the creator and/or archivist, and involves:

  • migration of the original formats to suitable archiving formats 
  • encapsulation of the original and migrated bitstreams in XML 
  • registration and encapsulation of the essential technical and archival descriptive metadata 
  • generation of a checksum to check the bit integrity 
  • checking the quality of the XML-AIP's (Boudrez, p.13).


Encapsulation: advantages and disadvantages 
Besides the obvious advantage of preventing obsolescence, encapsulation ensures content and contextual information is stored together, minimising the risk of losing valuable information. Metadata stored in the object itself (instead of an external location), that can be easily transferred and migrated with the object means information integrity, provenance, and authenticity are more likely to be preserved. It also means the digital objects are “self-descriptive and autonomous: they identify and document themselves” (Boudrez, p.5). Encapsulation can also aid emulation (as the software needed to be emulated becomes more complex over time), and makes migration of digital objects easier.

The disadvantage of encapsulation is that it relies heavily on standards to maintain readability, which as Dave Bearman points out, “naively imagines standards lasting forever. No computer technical standards have yet shown any likelihood of lasting forever—indeed most have become completely obsolete within a couple of software generations” (1999). It is also not great for binary file formats because “there is usually too little space and an expansion of the fields could cause interchangeability and readability problems. The addition of metadata to binary files also requires a separate module or software tool for each format, because usually such a functionality is not supported by current computer programs” (Boudrez, p.5). In the case of VERS, this means the producer is restricted to providing specified formats—Text, PDF-A, PDF, TIFF, JPEG, JPEG-2000, and MPEG-4 (PROV, 2003)—adding another possible barrier to digital preservation. The VERS model was investigated and finally dismissed as a possible strategy for Archives New Zealand.


Conclusion 
Encapsulation is a common—but not universal—digital preservation technique that, although not a strategy on its own, informs and complements other preservation projects. Metadata plays an important role: indeed, encapsulation relies on various degrees of embedded metadata in order to be successful. This has the advantage of bringing all the relevant information about the digital object with it into the future, but because standards are not always ‘set in stone’, this very reliance on standards could also be to its long-term detriment. Nonetheless, the core element of encapsulation—preserving important contextual and functional information for future use—is an important one that should inform all other digital preservation strategies.



References
Bearman, D. (1999). Reality and Chimeras in the Preservation of Electronic Records. Accessed 22 November 2011 from http://www.dlib.org/dlib/april99/bearman/04bearman.html.

Boudrez, Filip. (2005). Digital containers for shipment into the future. Accessed 23 November from http://www.expertisecentrumdavid.be/docs/digital_containers.pdf


Heminger, A.R., and S.B. Robertson. (1998). Digital Rosetta Stone: A Conceptual Model for Maintaining Long-term Access to Digital Documents. Accessed 22 November 2011 from http://www.ercim.org/publication/ws- proceedings/DELOS6/rosetta.pdf

Lavoie, B. F. (2004). The Open Archival Information System Reference Model: Introductory Guide. Accessed 20 November 2011 from https://blackboard.vuw.ac.nz/bbcswebdav/xid-1082506_1

National Archives of Australia. (2004). Digital Recordkeeping: Guidelines For Creating, Managing and Preserving Digital Record. Accessed 23 November 2011 from www.naa.gov.au/.../Digital-recordkeeping-guidelines_tcm16-47275.pdf

National Library of Australia. (2001). Encapsulation. Accessed 22 November 2011 from http://www.nla.gov.au/padi/topics/20.html

Public Records Office of Victoria. (2000). Standard for the Management of Electronic Records PROS 99/007 (Version 1). Accessed 21 November 2011 from http://210.8.122.120/vers/standard/ver1/99-7-3s2.htm


Public Records Office of Victoria. (2003). Management of Electronic Records PROS 99/007 (Version 2). Accessed 23 November 2011 from http://210.8.122.120/vers/standard/spec_02/

Shepard, T., and MacCarn, D. (1997). The Universal Preservation Format A Recommended Practice for Archiving Media and Electronic Records. Accessed 23 November 2011 from http://info.wgbh.org/upf/

No comments: