Category Archives: Authentication

Embedded Metadata, Part Deux: Authentication

One of the comments about my last post, by Ms. Amaral, concerned the issue of authenticity of documents which have been altered by the addition of metadata in the way I described. I think it is an important matter, and warrants some discussion.

First off, what is actually meant by “authenticity” ought to be addressed. In a strict sense, anything that is copied from another source is, of course, no longer the authentic item any longer. But that is not how the term is used in digital libraries. An authentic item is something that, in the end, seems to comply with an idea of accuracy in reproducing an original. That idea of accuracy, moreover, is dependent on the information intended to be conveyed.

That may sound marvellously vague , especially in the context of the words “accurate” and “authentic”, but I think it fits our actual use of the word. For example, in the case of, say, a court decision, where the essential information consists of the text and whatever formatting is essential to preserve the context of the text (paragraph structure, block quotes, footnotes, etc.). In this case, a conversion from a word processor format to HTML or XML could easily retain all the necessary characteristics for any reasonable person to allow that it is “authentic”. In fact, in the case of Lexis, Westlaw, and most commercial news databases, this is standard.

In contrast to the above example, a scan of an ancient text, for example, might be considered differently. In a case where the original appearance of the object is what is of interest, an “ authentic” digital copy would need to retain much more data, and could only be achieved with an image of reasonably high resolution and color depth. Such that where the object of debate is whether the “e” in a unique original is really an “o” with a mold stain, you can zoom in and closely compare that “e” with other “e’s”. And so on. In the case of a scan where such is the purpose, but the resolution is insufficient, it is not sufficiently “authentic” for the purpose.

In non-digital library contexts, this notion of authenticity is even more apparent. The example of translations makes this clear. What is an “authentic” translation? In a strict sense, the question is nonsense. In normal parlance, however, we use the term “authentic translation” in a utilitarian sense. There can be many a piece of literature, all “authentic” translations, and all different. One for the general reader, taking more liberties with literal meaning to make the work as readable in the new language as the old. One for the scholar, sacrificing the original’s readability for a more literal meaning. One for the artist, attempting to preserve literary depth of the original with some sacrifice to both readability and literal accuracy. All could be different, but all competent, and each an “authentic” rendition of the original.

Of course, with documents in a digital library, the editorial decisions of the language translator don’t happen. We do, however, make other types of decisions, like converting between text and image formats, etc. Each decision, including the decision to do nothing, has its advantages and disadvantages. But in the end, the practical accuracy of the information is what counts. I suggest that a document is still perfectly authentic even if converted from Microsoft Word to HTML, even if the footnotes look a little different. As long as you don’t change the text itself.

Having said all that, there is still a substantive issue concerning authenticity, but it is not the authenticity itself we are concerned with, but proof of authenticity. This is an issue because the popular methods of establishing authenticity of digital objects involve digital signatures, which do not verify the accuracy of the human interpret-able information, but whether the digital content of the object is identical to the original. Now, this does indeed verify authenticity, but only in a very strict sense which does not comport with practical usage. It is not practical because it does not allow for the inclusion of essential preservation metadata which alters absolutely nothing in the appearance of the object from the original in any way . It excludes the possibility of converting from an original format which may be highly unsuitable for preservation, to another format which would both preserve the information and be suitable for preserving the object. Finally, it excludes the possibility of making useful, non-editorial enhancements, such as hyperlinking of citations.

There really ought to be a better way. Or at least a better way to use digital signatures.

Along the lines of a better way to use digital signatures, I have suggested that a good approach to authentication would involve preserving original digital signatures as part of maintaining an object’s provenance. Any necessary change made to the digital object ought naturally be documented as part of the document’s metadata. Part of that documentation can and ought to include the digital signatures of previous versions. The result is a non-original object, but with the possibility of having rich metadata records, value-added features and durable formats with a verifiable history documenting exactly what has changed. This would have the effect of re-authenticating at the point of any acceptable alteration, as well as providing a record for investigating the source of any discrepancies that may ever be discovered.

I could go on at length about this here, but I have already done that in an entry at the Cornell Legal Information Institute’s VoxPop blog: http://blog.law.cornell.edu/voxpop/2009/05/14/authentication-of-digital-repositories/ (as of 1/14/2010, this article was off-line at Cornell. I have been assured, however, that it will be back up in the very near future.)