Califa Library Group

 

Imaging Standards

Page history last edited by malaniz@... 1 yr ago

Imaging Standards

 

When you begin a scanning project, you will either be acquiring hardware to scan the material yourself or outsourcing the work to a professional scanner.  Your funding agency may be direct you to use specific scanning specifications or you may have to create your own, either way you should understand the standards that you are using in order to review your scans for quality control.  In this article, I will try to explain the criteria used to create the standards, what their purpose is and how to evaluate your images using them.

 

 

The basic criteria for scanning standards are image resolution, color, and file format.  The first thing to know is that one size does not necessarily fit all. For example, a photograph is composed of a series of toned areas, which can best be seen at a lower resolution than handwriting, which is composed of sometimes very fine lines. You can scan in black and white, grey scale and color, depending on your original material. Those of you using the California Digital library Imaging Standards will note that it has a variety of specifications depending on size and original material in order to set the best standard for the original material.

 

 

Resolution

Resolution is often discussed in terms of two different measurements, PPI and Pixel array. This is a bit like using tablespoons and pounds to measure butter.  In the baking situation, one measures weight and the other measures size.  You can calculate how much one tablespoons weighs, but  you are still measuring two different  things, weight and size. Why do recipes measure butter in two different ways? I don’t know, but I assume its because that in some situations the weight is more important and in others the size is. Anyway, that is the reason for the two different measuring methods for resolution.

 

 PPI measures the pixels per inch, or what I call the density or weight of the image information.  It was a measurement developed for printers measuring the number of ink drops per inch. The greater the dots or pixels per inch the smaller the dot and the greater the resolution of the image.

 

 Pixel array is the number of pixels captured which measures the quantity or size of image information.  It was developed to describe electronic capture and display, such as in specifying a monitor display.

 

 At this time, most standard-setting bodies use pixel array, but some institutions still specify their scanning by PPI. In some ways, PPI is a very clean solution, in that you capture the same quantity of information for each object relative to its size.  I think of this as an archival approach, making a true digital copy relative to the size of the original object, but it does not take into account that some original material is best scanned at different rates of capture.  As mentioned before handwriting requires the capture of more information than some graphic objects to be clear,.  In fact some material, such as newspapers or poorly printed postcards when scanned at a high resolution, show more defects than the original (?) image. Also, as the file size is relative to the size of the original material, you might get file sizes larger than you can process on your own machines, even without undertaking a scan similar to the Metropolitan museum‘s digitization of its Unicorn Tapestry.  In that situation, they had scanned at such a high resolution that the movement of the warp drying and shrinking was caught and they could not stitch the images together.  Two Columbia physicists had to develop a specific math to execute the image stitching. So PPI gives a consistent capture, a certain level of PPI may be the minimum for some material, but it is inconsistent in file size.

 

So why do most standard setting bodies use pixel array? It is because pixel array has the benefit of uniformity in file size, which makes it easier to plan and budget.  However, if your collection varies greatly in size or types of images, you end up with a specification that is very good for most, but too little for some and too much for others.  So, most institutions are now using a sliding scale specification, as California Digital Library has done for the Online Archive of California, It has divided the material into basic categories: Textural documents, illustrations and maps, photographs (reflective) photographs (transmissive), objects and artifacts, They have also divided the groups in sizes.  In this way the “ good enough” standard is spanning a smaller spectrum.

 

 

Resolution during Quality Control

 

When you are reviewing images for quality control, it is important to remember that on a monitor, the density of information is consistent from 72 to 100 pixels per inch, but different display devices can show different pixel arrays such as the common 1024 x 768 pixels.  If your master digital file has a pixel array of 4000 pixels on the long dimension, your computer will display a reduced image, perhaps 29%.  They are not shrinking the pixels when they do this, but rather displaying fewer pixels, thus the master file often looks softer than a file that is smaller in size.  It is important when checking a scan to look at it at 100% or “actual pixels”, even if that is larger than your screen, because it is only then that you are checking and seeing all the pixels.

 

 

Color

Color is usual specified by bits in the term bit-depth.  Bit is the computer's unit of measurement for memory.  Therefore, bit depth simply means how many bits of color information are being captured per pixel. The more color information captured, the subtler the gradations between colors. Color is displayed electronically as red, green and blue or RGB. The effect of different quantities of color information (bit-depth) can easily be seen if you adjust your monitor to display fewer colors such as only 256 colors.

 

 

 Scanner and cameras continue to evolve, increasing the number of bits of information for each color (RGB) that they can capture. As of this writing, 48 bit (16 for each color) is available. While, our eyes are only capable of differentiating between 24 bits of color information and many software programs can only work with 24 bits of color, it is still better to try to capture more than this if possible.  There is an adage that is it is better to throw out information then not to have captured it in the first place.

 

Another specification that defines color is dynamic range. It is not used when as a standard for scanning work, but it is a specification for scanning equipment.  It defines the range of tones from dark to light. The higher the dynamic range, the more detail you will capture in shadowed or highlighted areas.

 

Gamma is another hardware specification that describes how a monitor uses light to display. The most important thing about it is that Windows operating system, upon which most of your images will seen uses a standard gamma of 2.2. If you are doing Quality Control on something other than a Windows platform, then you should be sure that the gamma of your monitor display is set at 2.2.

 

Color within Quality Control

 

As most of you are aware, one color will appear differently on different machines.  This is due mainly to the fact the monitor’s hardware “reads” the digital file differently or needs to be calibrated to accurately reflect the colors it is “reading.”  You of course cannot calibrate all your users' monitors, but you should calibrate the one you are using for quality control.  This calibration includes brightness control and light color.  Many a “dark” image was corrected by calibrating the monitor.  There are several desktop color management programs for less than $200, which also include the spectrometers that are machines that sit on the display and accurately calibrate it. Another great way to manage color, especially in terms of dynamic range, is to include a grayscale target when scanning the object. You can then use the greyscale to further calibrate your system.

 

 

File Format

It is the standard now to create a Master file from which the accessible files are derived. These are then called derivatives.

 

 

Master File

 

It is important here to reiterate the importance of a master file that represents the most information captured, in other words not edited, and in a file format, that has proven the test of time. At this time, the TIFF file, Tagged Information File Format, is the accepted master file format. From this master file, you can then derive other files, for regular use. These might be small thumbnails, which are usually saved as GIFs as this format tends to create the smallest files.  They also may be downloaded as screen (large) images, which are saved as JPGs with different levels of compression.  Some people also modify and then save a second master file from which they make their derivatives. The first master file, which has not been modified, is renamed the Archive file.

 

One of the main reasons that you want a master or archive file, which has not been edited, is that any software program has certain color limitations, which are called Color Spaces. A ubiquitous one is sRGB which Adobe often uses. It is good for the web because its range of colors can be found on most monitors, but for this very reason, when you work in this color space, you are automatically dropping the more esoteric colors from your scan. If you must save your file within a software product, be sure you are saving it in a large color space, such Adobe 1998.

 

There are three file formats that are gaining popularity, RAW or DNG and JPEG2000. Many people would like to use them as their master files. In many ways RAW and DNG (digital negative) are the perfect master file. As their names imply they are the raw scan without any editing or information loss, including retaining the 48 bit of color. Unfortunately, they have not yet proven the test of time in terms of obsolescence.  In addition, these are propriety formats, which require the continued support of developer to survive.  Ten years ago Kodak hit town with the PhotoCD.  It was a great file format.  It is already not readable by many machines and programs and is no longer supported by Kodak. 

 

JPEG2000 is in a less tenuous position as it is supported by an organization, but it too has not yet been widely developed. The file format PNG was also developed and supported by another large organization and it too has not really survived.  JPEG2000's major strength is that while it may be a very large file with a great deal of visual information, with the right viewer you can view it easily on the web.

 

Another aspect of the JPEG2000, which the old Tiff shares, is that you can embed metadata within the image file itself.  You should be sure that the scanner embeds information, such as its model number, as it creates the TIFF.  This way you can gather technical information automatically.  You may also ask your scanning vendor to embed other information, such as your institution’ name, object name or brief description.  More and more programs are able to read this data, but as with any data it then needs to be checked and verified.

 

 

 File format during Quality Control

One of the best verifying tools, for both quality control and obsolescence prevention, is called a checksum.  Your scanning vendor or you should create it when the files are made.  Afterwards, you can run it any time. It literally counts the bits, compares and says good file or corrupted file.

 

Another program written in the java language is called Jhove. It stands for JSTOR/Harvard Object Validation Environment. Among other file formats it checks the validity of the Tiff file format especially in terms of how the tiff header information has been embedded.

 

 Derivative Files

The method of access, in terms of speed of transmission, and your rights to publish the image, will determine the size of these files (pixel array). The file format in which they are saved may depend on content. At this time, the current standard is to save continuous tone images as .jpg files and line drawings and text as .gif files. These three letter acronyms are the file tags and occur at the end of the file.

 

Note to older Apple computer users: Apple computers do not always automatically add these file tags (.jpg etc.) when you save files. You must enter them as part of the file name. There is also the option to save the image as a .PDF file. PDF is a file type developed by Adobe and most commonly created with its Acrobat program, but this file type is becoming more universally accepted and can be created by other software. It is still a new file type, but it is being widely adopted. It is excellent for long documents with a great deal of text, as it displays a very crisp clear image in a small file size.

 

The key here will be establishing the pixel array (size), which you are able to display both technically and legally. As mentioned above, the Jpeg2000 is a great derivative file format if you have a viewer, because it allows you to show such a large image over the internet, if you have the rights.

 

To sum up your imaging Standards:

 

·         Establish the quantity and method of measuring the information your wish to capture. 

·         By rate of capture - dpi

·         By quantity of capture - pixel array

·         Color in bits

·         Be flexible based on size and original material.

·         Review at full size

·         Calibrate your monitor

·         Keep a Master file archived from which you create accessible derivatives.

 

 

 

Comments (0)

You don't have permission to comment on this page.