Califa Library Group

 

File Storage

Page history last edited by malaniz@... 1 yr ago

File Storage

Your first step in managing your digital files occurs when you store them.  Storing files is much like cataloging books, except that cross-referencing is much easier. You give each file a unique identifier, you might even consider file names as call numbers, and you place it in a unique location, it is sometimes helpful to think of your directory structure as shelf locations. In addition, just as with books you can organize to the nth place, with both identifiers and locations.  You might separate your fiction into reader-interest categories, such as mystery and fantasy or you can put all fiction in one area and organize by author name. For the most part this is to make it easier for your users to find the book. With digital files, much of that effort can be done with good metadata tags. In fact, in this instance, your real user is your computer. You want to make it easy for it to find the right file as quick as possible.

 

File Names

If you have not heard the mantra, learn it now: file names must be unique. However, if someone has told you the best way to insure this is to use an automatically generated meaningless alphanumeric string, forget it.  As Librarians in particular know, unique identifiers can also be meaningful. In fact, much as Mr. Dewey discovered all those years ago, meaningful unique identifiers are extremely useful.

 

Defining the information you wish to convey is the first step in creating a file-naming standard. I feel that this information should include a reference to the original object, your or the owner institution and location of the original.  Some people add a suffix describing if it is a Master file or derivative.  If each of these files is in a different file format, Tiff, Jpeg and Gif, there is no real need to do that in the file’s names. The important thing is to create a standard that is consistent, alphanumeric and as automatically entered as possible. Often these standards are formed like this InstutionsID_ObjectID_scan_001.

 

My usual recommendation is that you use your current method of identifying the object as the base for your file's name—be it ascension number, ISBN or whatever you have evolved for your collection. The beauty of this approach is that then the original source information is integrated with each file. Be sure your homegrown identifiers are scalable and computable.

 

Scalable means that the system allows for growth of the number of parts, and computable here means that you use "0" place holders, so that 1 is expressed as 001. For example, if you had an identifier such as photoarchive, you might start naming the individual pieces PhoAr_001 (assuming you plan to have fewer then 999 pieces).

 

Computable also means that all computer platforms should be able to read the file name. To be safe, this means you should keep the filenames short, never leave spaces in the name, and avoid using a period or slash / as a separator.

 

Directory Structure

Your directory structure further defines the digital file, as its true identifier in computer terms is its path, or directory structure and file name. Since the path aids the computer in finding the file, it should be designed for the machine not for human looking for a file. What you want is an organization that is based on how your software program or management system is retrieving images.  This reduces the workload on your computer. 

 

Thus, many structures start at file type, all the master images in one, screen images (usually jpegs) in another, and all thumbnails their own folder.  In fact, that may be the only organization that they do.  When collections have compound objects, which require a series of images be viewed sequentially, the structure might start at the object identifier and then the image format file. The point is to try to think like your program, or better yet talk to those who know how the program searches.

 

Above I have been talking about directory structure in terms of the permanent home for the file, but during its creation, you may find that temporary or in-process directories are very useful for the managing your workflow process.  Such a structure might track the scan's progress. Let's say you are using two scanning techniques or pieces of equipment, so you are first going to organize your image files according to how they were scanned. Then within that organization you might set up a workflow of Scans for Approval and Scans Final.  You might also want to set up folders within these folders that reflect the scanner or the date of the scan.

Obsolescence

Digital files and their storage devices can go bad. Yes, I know it is hard to accept but you cannot expect digital to be permanent, in fact, it is extremely temporary, whether it is superseded by new technology or damaged by a dust particle in the wrong place.

 

One of the best verifying tools, for both quality control and obsolescence prevention, is called a checksum.  Your scanning vendor or you should create one when the file is made. Then you run it any time, it literally counts the bits, compares and says good file or corrupted file. This makes validating the data much easier, than opening each file.  Also, of course, you must have back-ups.  Some people are now insisting on remote storage for back ups and duplicate back-ups. A back-up is any copy of the digital file and its metadata. The original storage media is one copy, the server from which images are accessed is another, but this is only in terms of the digital files. Do not forget to assure that your metadata is also backed-up.

 

When storing your digital files it is important that there be a plan to verify their stability, and readably with your available technology. You should have safe and secure back-ups for your files and their associated metadata.

 

To sum up your File Storage procedure:

  • Unique Identifiers for files should indicate ownership and object identity
  • Folder directory should be designed for you management program
  • Protect your files against obsolescence and corruption

 

 

Comments (0)

You don't have permission to comment on this page.