Archival Media Preservation header image 2

Fixing Metadata (or Let’s Do it Right the First Time)

March 10th, 2011 · No Comments


In years of teaching visual indexing and being called in to create metadata schemas, I have seen some crazy attempts at description.

Sometimes we have been involved from the beginning developing thesauri of specialized terms for a collection, more often we are called in to fix existing records.

As I roll up my sleeves to tackle either project, I often wonder why organizations do not know more about what they want.

I come down to the same answer that permeates our profession as a whole. The majority of people do not understand the work that goes into providing quality. In our current era of fast and cheap; people have lost the quality aspect almost completely. When they can not successfully execute an accurate search in their database, then they call us to fix it. I am absolutely happy to do so, but make no mistake, I wish for that collection to have done it right the first time; rather than to have called us after hundreds of hours of wasted work. Quality becomes a feature of importance often only after a failure rather than as a preventative measure.

As I tell my classes, let’s talk about why doing it right is rarely done:

1) Illusion – “Everyone is digitizing” is akin to what your mother taught you as a child. “If everyone jumped off a bridge would you do it too?” Many Asset Management Companies sell short the highest cost of digitization which is (dun, dun, dun) linking the metadata to the record. The metadata needs to mean something.

I once saw a vendor selling his “automatic indexing” system. I stopped to chat with him. His product, he told me, will negate needing a human to index. As this is one of our services, I thought that I had better pay attention. He proudly told me that the video clips that he was showing me worked off of closed captioning. I was glad to know that I was not out of business. If you have ever viewed closed captioning, it is a fantastic service to those hearing impaired but it is far from error free. Aside from the many spelling errors within this due to the pressure of typing the words as a show airs (for live shows), there is no intellectual analysis of what is being said and how it relates to the visual.

If an actor said about a child, “Her temperature is 105 degrees!” Assuming the spelling was correct, that is all that the search tool would allow for. A professional indexer could include “Fevers, Childhood Illnesses, Sickness, etc”. This extra analysis would allow a successful search. Most user’s would not find that video clip by looking for “temperature” and they might not know it was a child if that is what they wanted. They would have to pull up the clips and view them. If your collection is going to stay very small, maybe this kind of quality will not matter to you.

For some, I worry that when management, tax payers or a municipality sees bulky systems with little relevance in results, they will certainly shudder at writing more checks for the system or archive.

2) Internal Pressure – “Everything needs to be digitized”
We see this pressure to digitize everything without a clear plan for prioritization. A serious needs assessment is required to be done to understand what needs to be digitized, why and what needs to be researched and described.

I have often told my students that I would rather misfile a photo negative in a physical drawer than have misinformation on a digital record. I am more likely to find it again in the drawer than in a large database.

3) Money – “Scanners are cheap, how much could it cost?”
Money is tight and people are even more apt to cut corners now. It is always cheaper and more accurate to plan something out and do it right rather than to try and fix it afterwards.

Building a business case for the step by step process of tracking assets, designing metadata, the costs of hardware/software/maintenance, training, etc. is often looked at as daunting or impossible. It is not. You have to think like a cost accountant to spell out the savings and efficiency gained. There is also often a publicity component to having an organized and highly accessible collection. This is something that needs to be built in to the value.

4) Ignorance of Computational Linguistics/Human Computer Interfaces/ Usability Studies/Search Strategies/Term Linking/(More) – “Just throw some keywords on it.”

Many times I have seen upper management wave their hands in the air as if with a magic wand and say “Just get it done.” Unfortunately, I am too old to believe in the magic wand and hard work is the only way to create a successful search tool. When I say “hard work”, I actually mean really, really hard work. Research, focus groups, linguistic analysis, understanding search tool limitations, etc. all play a part in quality design.

Along these lines, there is an interesting project that was all over the news. IBM has a team from their labs that have designed a computer to compete on Jeopardy. PBS – Nova had a documentary on it and many of the things that the lead researcher, Dr. Ferrucci, mentioned in that documentary is relevant to our field. The primary comment that caught my interest was when he discussed the fact that a computer competing on Jeopardy can be fed thousands of background documents but they have to work very hard to understand the actual question being asked. So they have the answer, they just do not know the question.

The human brain “gets” the context of place and language. Computers have not yet mastered this. Those brilliant connections of slang, historical context, cultural cues, body language, etc. are a tremendous gift that humans have.

I have stated for years that I wished our culture valued the human brain as much as technology. Watson’s project is interesting and what it proved on Jeopardy is just a portion of what it will prove going forward.

Until then, we who aim to direct searchers into exactly the video clip, manuscript or image that they desire need to value our brains and find better ways to sell our skills.

My indexers know that “Picket Fences” have a certain lifestyle context. Automation or even off-shore indexers do not know that and we can do so much better than cutting corners on core concepts.

Let’s use the gifts that technology gives to us. The ability to link, create synonyms, cross-reference records, stream clips, etc. are all exciting tools and work best in conjunction with a well thought out plan designed by a human brain.

Good luck Dr. Ferrucci but I am not sure that it is Watson that is on trial but your brain.

Related Posts with Thumbnails

Print This Post Print This Post

Tags: , , , , ,

Category: Archiving Challenges · Developing A Digital Collection