Monday, March 12, 2007

Sheet Feed Scanning

From Mike Blomberg, Senior Imaging Technician:

I have devoted the past two months and then some towards scanning a majority of volumes of the Annals Of The Missouri Botanical Garden (excluding issues from the past two years). I thought it would be interesting to shed some insight as to production work with the Kodak i280 sheet feed scanner as opposed to the Indus 5002 scanners that we use for most of our scanning work.

All in all 92 volumes of the Annals will be captured. Even though this is certainly not the longest run we've worked on (take Flora for example), it's good to be wrapping up work on it finally. Out of those 92 volumes, only 22 of them were bound copies scanned with the Indus scanners. (This was only because we didn't have extra copies of those particular volumes that we could sacrifice to the guillotine.)

On average, it takes about an hour to scan a typical book using the Kodak sheet feed scanner. If you assume that about five volumes could be completed during a normal work day, I only estimated that scanning the 70 volumes of the Annals designated for sheet feed scanning would only take about a month to scan. Unfortunately, the Annals took much longer than originally anticipated.

Besides the sheet feed station being down for two different periods where it was used to evaulate two different microfiche scanners, other factors came into play that pushed the completion of scanning back much further than expected. One example is that the volumes started getting longer and longer as the run progressed. By the 1960's, it wasn't uncommon for volumes to be twice the size that the early Annals volumes were. Some volumes had page counts upwards of 1200-1400 pages as opposed to the average page count of around 700-800 pages.

This run has been the most consistent that the i280 has been used by us. Normally it would only be used for a handful of volumes in a given run that were deemed suitable for sheet feed scanning (books that were already falling apart and such). One major issue that needs to be attended to with the i280 is dirt accumulation... With this much usage of the machine, cleaning needed to be done much more frequently (usually once or two a volume). Dirt easily gathers on the various components of the machine given the nature of paper. The two biggest culprits in slowing down production are dirt on the scanning heads and dirt on the feed rollers. Dirt on the scanning heads obviously create problems with image quality, however when dirt collects on the rollers, it becomes difficult for the machine to feed pages into the machine.

One of the most interesting tests that this run presented was to see how the Kodak i280 handles different types of paper. Up until this point, everything that has been run through the machine has been very dull matte paper. However, sometime in the 1960's, the Annals were starting to be printed on glossy paper instead. I didn't expect much of a difference in the performance of the machine, but that proved to not be case. The glossy pages tended to stick together more often resulting in what the Kodak software calls "multifeeds" (where more than one sheet is accidentally fed through the scanner). The scanner can be set to stop when it encounters a multifeed, however, with the glossy pages, a number of multifeeds went through undetected by the scanner. The glossy paper of these volumes also tended to put a strange film on the feed rollers that would have to be regularly cleaned off (usually a few times in each volume). It was different from typical dirt and dust that gather on the rollers, but the effects were the same if not worse! I doubt we'll be scanning many books with glossy pages considering the focus of our scanning is for pre-1925 literature, but at the very least it was interesting to see how the i280 reacted differently to glossy paper.

Sheet feed scanning of the Annals of the Missouri Botanical Garden is completed as of this point. Right now, just a few gaps are being filled in with bound volumes being scanned on the Indus 5002 machines. Next up to be sheet feed scanned: Rhodora. It is comprised of 70 volumes and every single volume is to be scanned using the i280... Luckily these volumes are smaller both in page count and in physical size so the scanning should proceed much quicker than with the Annals.

I hope this entry sheds some light into the trials and tribulations of sheet feed scanning!

Mike Blomberg, Senior Imaging Technician
Missouri Botanical Garden

Better citation resolving and other web site updates

We've made the following updates & changes to the Botanicus web site, including better integration with Tropicos & more helpful screens resolving protologue citations. You can view these changes at http://www.botanicus.org

• Tropicos names integration.
• Tropicos and uBio names displayed in the same panel.
• Search text box moved from left hand side to the header above the tabs.
• Increased height of title browsing tabs in the header.
• Updated logic for check for existence of PDF files.
• Updated logic for display of OCR text.
• Updates to the display of title and item drop down lists in IE 6 and below.
• Added better exception handling to display generic error and page not found pages.
• New page resolve logic from Tropicos that only displays a citation if an exact match was found. Similar results displayed if no exact match could be found. To view this in action, follow the "View in Botanicus" link at http://mobot.mobot.org/cgi-bin/search_vast?onda=N24201228

Unfortunately we're still chasing down a bug with some browsers over how they cache scripts for Botanicus. Hit your browser's Refresh button if areas of the web site appear to overlap or are not evenly aligned.