• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Lessons in Digitization Project Management

Page history last edited by big 14 years, 3 months ago

Lessons in Digitization Project Management



  •        Define your project’s scope in detail


  •        Set suitable standards for participants


  •        And set standards for all activities.


In the 2005 – 06 grant year, California decided to use a different approach for its Local History Digital Resources Project (LHDRP) than it had used during the prior 5 years. The LHDRP, which is funded by the Library Services and Technology Act (LSTA), assists California libraries to provide enriched access to local and state-history visual materials. This year the State Librarian, Susan Hildreth, proposed a “Solution in a box.” Susan’s concept was that the local libraries could concentrate on gathering the information and material while the technical components, scanning and data merging (creating a digital object) would be outsourced.  There were two goals for the “Solution in the box,” a more efficient use of the library staff’s time and a more consistent product.l research papers


The responsibility for building Ms. Hildreth’s “In-a-box” solution was divided between Califa, a membership network of California libraries, and the California Digital Library (CDL), which supports the Online Archive of California (OAC) [http://www.oac.cdlib.org/].  Califa was responsible for the scanning contract, while CDL managed the creation of the digital object. The goal was to develop a system that would fit the largest library or the smallest, the expert or the neophyte.


In the past Califa sought specific digital products from their members, but as quickly became clear when reviewing the applications from the 19 libraries chosen to participate in this cycle, there was no commonly defined product.  There had been no need in the past for detailed specifications of the material to be scanned, as the libraries had been responsible for delivering the entire image & file. Thus, in the “In-a-Box” applications libraries described the history or story that their collection would tell, not the specification of the material that would tell the story. We did not know what size or medium the images would be in, let alone that information for the letters, brochures, or other “supporting material” which were referenced.  There were also the compound objects. A compound object is one that has several parts; such as a 4-page letter is one compound object. The grants were for 200 “objects” and several libraries assumed an object was an object, no matter how many parts it might contain. A survey was clearly in order, not only to specify the objects, but to also ascertain the actual number of scans that would be required. Before we could write a Request for Proposal for digital imaging services, we needed to carefully define the project materials.


As it turns out, the survey was extremely useful for all concerned. We made it broad enough that the libraries did not have to select exact objects, but they did have to specify the type of material that would best support the story that they wanted to bring to the Net for all to see. This process allowed all of us to begin to red-flag some items, such as copyright, fragility and best content. 


To structure the survey we used the California Digital Library image specifications. Besides being the specifications that the vendor would have to meet they also differentiate between tonal (photographs) and graphic (line art/text) images, between reflexive and transparent (negatives or prints) and between different sizes. This worked well to help us organize material by size, image type and original material, but it left us one gap:  the maximum size.   Our largest object was 137x185 cm (54"x73"), which was significantly larger than our selected vendor could scan.  We also learned other useful questions such as: Are photos mounted on a board? How many are warped? Are documents written on both sides? Are they original photos? Do you have negatives for these images? 


Most RFP and survey samples that we found were for documents, not photos, but we were fortunate to discover RLG’s RFP template [www.rlg.org/preserv/RFPGuidelines.pdf] that Cornell University had developed for RLG.   It helped us greatly and I highly recommend it.


After defining the scope of materials and reviewing it with all involved, we then began to define the standards.  As we were setting up a procedure that 19 different libraries would follow, we found ourselves developing and documenting procedures, in addition to establishing quality control standards.  This early effort paid off considerably during the year as questions arose.  In fact, I am still amazed at the overall smoothness of the operation.


California Digital Library, as host had already developed the standards for digital objects that they would place in their digital repository and Online Archive of California, (OAC).  The digital objects from this LHDRP session were to be part of a trial for their Digital Repository. Part of this trial was to embed technical metadata in the TIFF header tags.  This metadata included the following data: Color profile, owner, vendor, image title, pixel size, ppi and hardware.


Library-specific information: file naming, Library as owner and object title, became incorporated in the Project Activity Worksheet (PAW).  It tracks all activity and information about an object in one Excel spreadsheet.  The spreadsheet also has a place for comments from the institution, vendor and Califa as the project manager.


We welcomed the vendor’s modifications to the form to provide the information in the format that the vendor needed. Many libraries continue to use the spreadsheet for all their data collection about the object.   The object was later uploaded to the software that CDL uses for the creation of the digital object: Contentdm.


In retrospect, we are simplifying the PAW for next year, especially in the quality control area.  We had tried to simplify things by giving people a simple check box approach, such as:


  •         No image file for item


  •         Wrong item scanned


  •         Image file not readable


  •        Incorrect unique root identifier


  •         Incorrect component identifier (compound objects only)



While these acted as good guides of what to look for, most people simply used the comment field.


Quality control is often nerve wracking for those less familiar with the process, so Califa took responsibility for reviewing the technical quality of the files.  While comments from the libraries about the technical quality were welcome, their primary responsibility was verifying accuracy of the image files.


Since the libraries had to review and accept at least 50 image files in a week, and sometimes 300 if they were submitting compound objects, they found limiting the review to determine accuracy to be more doable.  One of our best standard procedures was closely adhering to the workflow schedule. It kept everyone on target.  The libraries were divided into 5 groups of four libraries each, which followed the same schedule: Submit a set of 25 objects to the vendor; Two weeks later, receive the digital files, one week later complete the review. A week later, submit another batch of objects.


In the technical review, we also used measurable standards.  Did it meet specifications?  Of course, some digital specifications are a moving target.  We had settled on a quality-control test entitled JHOVE (JSTOR/Harvard Object Validation Environment.)that Harvard is developing that verifies TIFF format.   The entire first series did not pass this verification.  By working with the JHOVE user group, the vendor and trial-and-error, we were able to create Tiffs that were validated.  We never got a definite answer why JHOVE would not validate them, but some of us felt that Adobe software was inserting data into the Tiff header in a manner that JHOVE found objectionable.  That is the problem with trying to adopt universal standards, which are not yet universal.  JHOVE’s focus is on preservation and universal standard.  Adobe’s focus is neither.  Eventually it might be, but for now, we simply had to remove the extra data to have JHOVE accept the files.  Is this a better file? We cannot say now.  It is certainly not a worst file, and it is formed according to the current preservation standard. 


The libraries were not involved in this technical process, as the lack of JHOVE validation did not restrict their viewing and verification of their images.  It did however result in our deciding to reissue a master set of images at the end of the project that contained all the corrected images in correct order. 


So the lesson learned was that even with a project involving 19 very different entities, if you accurately scope out your project using measurable criteria and establish clear measurable standards for all parts of the process from submission, to review and acceptance, your project can flow smoothly and successfully.


In the coming months, we will have guest contributors, who have first hand knowledge of the experience, give their version of the process.




Comments (0)

You don't have permission to comment on this page.