September 4, 1998

Product Reviews: Paper!

This issue is an attempt to review some of the products I've been using. I shouldn't add the usual caveat but will – one the purposes of these "columns" is to give me a chance to do writing without the full coherence I'd normally devote to an essay. I do hope to review and organize these into more coherent essays in the future.

This issue will focus on products I try to use to deal with the paper world. These are personal document management tools rather than corporate ones. But there are some similarities as we move from using a single computer to more "household" and group activities. Since this is more a review, I won't delve into my wishlist except to note that these are accommodations. I'm looking forward to getting more of my information electronically as data rather than pictures.

But even where I can get documents like bank and credit card statements online, the paper versions often have additional information such as the images of checks and payment details (for American Express). More important, is that they represent reconciliation units rather than long lists of undifferentiated transactions online. Some providers such as Webcard and Amex do provide statement by statement organizations but they are still the exception.

The good news it that online availability is growing. But the design point is still the eyeball rather than the program. But I've been using the browser interface to write programs that extract the data anyway on the assumption that they are not going to change the presentation structure very often and they there are clues in the presentation markup that allow me to identify the data. Still, it is far from perfect.

Visioneer's Paperport

I've long been a user of Visioneer's products. So far, I still see them as the best of the breed but that is not the same as perfect. The latest version, Visioneer Explorer comes closer but also represents a step backwards in some aspects.

One of the precursors to this category was Wang's Freestyle product from the late 1980's that made it easy to scan documents into the computer and view them as thumbnails. One fatal flaw was the lack of much integration with the existing platform. The machines were still weak, slow 386's and 286's and Windows was still not quite there. Overall, a nice concept but not over the threshold.

Visioneer's key innovation was to use a inexpensive scanner and have a desktop program that automatically responded when paper was inserted. The documents are represented as thumbnails on the desktop and are sufficiently large to allow for identification and some organization.

One problem was that their storage system used its own naming convention, perhaps due to the historic limitations of Windows 3.1. But it took till Visioneer's Explorer (VE) to rectify this and use standard file names.

On the positive side, it was simple and quick to manage and edit documents. It also had links to programs and utilities that made it simple to organize and edit the documents as well as OCR, print (and thus Fax) them. Acquisition is enhanced by the ability to print to the desktop.

VE not only uses the native file system but allows one to view documents in arbitrary directories. But along with the improvements came a big step backwards. Instead of a fast and integrated viewer, the viewer is an external program that is slow to start. More serious – one can't rename a document while viewing it. This seems minor but it important to be able to quickly give documents effective names.

VE still prefers ".max" files but now allows more use of native file formats such as TIF but support for nonMax files is not as smooth as the native format.

Unfortunately Visioneer seems to have saved money on support. The support staff, if they respond, tends to treat customers as idiots and any question is answered with a tutorial for idiots. This has made it difficult for me to work around problems in their software and to make suggestions. Still, the use of real file names in VE goes a long way to allowing me to resolve problems and also to do effective backup.

Some quick suggestions, should anyone at Visioneer read this:

  1. Make the viewer work more closely with the desktop so as to allow renaming.
  2. Allow a bigger thumbnail view and improve viewing on TIF files. This is important for quickly rearranging and reorganizing documents. Or, at least, allow me to quickly magnify the page number. This is especially important for handling double-sided documents
  3. Allow a multipage file to be viewed as a subfolder rather than one page at a time in the viewer.
  4. Allow multiple desktops to make it easier to manage directories.
  5. Support directories on the network as well as locally.
  6. Debug Twain support so as to accept a wider variety of scanners and don't handle events for scanners you don't support.

On the positive side it has some nice features

  • As noted, the thumbnails are generally large enough to identify the document.
  • It support "printing" to the desktop which makes it easy to prepare a document for faxing but manipulate it first.
  • The scanner does automatically launch the program which makes it very quick to work with it.
  • The stack and unstack capabilities make it possible to quickly organize documents and assemble them out of scanned pieces.
  • I use the Black & White scanner so don't need to deal with too many choices when I want to scan. The 200 DPI mode makes it simply to just scan and, if I want, copy or fax a document without having to make too many decisions. It also makes scanning fairly quick.

I did have a little experience with Visioneer for the Mac but it was too clunky to use. The Mac certainly seems to be showing its age in many ways and, as a backwater, doesn't get the necessary attention from product developers.

Pagis

This is a program from Xerox and an outgrowth of the acquisition for Ray Kurzweil's reading machine company.

The current version (2.0) uses the standard file system folders. One can designate a folder hierarchy as being handled by Pagis. It is similar to working with a standard shell folder extended with Pagis tools and their thumbnail viewer.

This was a major advantage over Visioneer prior to VE.

The problem is that Pagis is still clunky and their emulation of shell folders is problematic and often buggy.

Pagis uses its own XIF files but can important/export TIF and other format. Since I don't want to be locked in, I try to stick with TIF but lose some of the more useful capabilities.

The Pagis viewer is more powerful than the Visioneer one but that comes at a price of making simple things awkward. It is also very slow to start up, of often resulting in a pile of viewers appearing all at once.

I expect many of these problems to be addressed in the future but with VE, Pagis' major advantage of working with the native file system is not as strong as it used to be. Whereas VE maintains their own viewer, the Pagis shell emulation becomes a liability any bugs and design flaws mean that all shell access to the documents is compromised.

Others

I've only glanced at others. The general problem is that they are like earlier versions of Visioneer in that they manage their own desktops without allowing access to the rest of the file system and show less finesse in document management.

Still, if others have suggestions, I'm interested in looking at them.

Faxing

A legacy part of document management is the Fax. While the Fax modems use a different modulations scheme from normal data modems, the bits themselves are in a standard format, usually Group 3 or Group 4 compression. The transition to transmitting them view the Internet was obvious. (BTW, I integrated Fax with Lotus Notes in 1989 so got into the technical details fairly deeply).

I use a number of services:

  • JFax. This seems to be the premier service. It offers inbound phone numbers in a number of cities around the world. In addition to Fax, it supports voice mail which comes in as a GSM attachment. One can use the web to manage one's account. There is also the JFaxsend for outbound faxing but since it lacked NT support, I haven't used it as my primary outbound service. Once nice feature of their outbound support, is that the printer drive also allows me to print to a TIF file, even under NT. A very useful capability.
  • Faxmission (AKA Faxnet). This is a Boston-based service that I learned about from Mediaone. It supports inbound and outbound faxing. I use it as my normal outbound service since it has drivers for Win95 and (now) NT and connects directly to their server without using my mail system for a transport. It is reliable and simple so I've gotten used to it as my default.
  • Faxaway. Faxaway was the first outbound service and still useful because it is simple. Just send an email message to the fax number with, optional, attached files. It can convert a number of PC file formats and also offers a printer driver that produces TIF files. One can setup an account online and pay for only those faxes sent. It's still a good entry level service. The main reason I switched away was Faxnet's ability to print and send in a single step.

The actual service is fairly simple so it's a commodity business. There are many advantage to faxing online, especially in conjunction with document management software.

  • Since I use document management software, the ability to treat a fax as just another source or output device makes life much easy.
  • Directly "printing" to the fax produces much higher quality than printing and scanning.
  • PC scanners are much higher quality than the scanners built into fax machines.
  • Forwarded and annotated faxes don't lose any quality.
  • No need for a phone line and, as with JFax, one can maintain phone numbers in a choice of cities.
  • Price? Cheaper except for local faxes but the easy of use makes up for the cost of local faxing.

There are other benefits for heavy Fax users, but my focus is on personal use.

Scanners

I use a number of scanners for various purposes. I still have a preference for the Visioneer Vx (sheet feeder) since it is so simple and efficient to use. But it does have its limits so I need to look further. Since the focus here is one document management, I won't discussion the HP Photosmart and the Minolta Dimage (perhaps at another time).

One of the problems with scanners is that TWAIN is a kludge. That's just a given since the designers recognize that also. Ideally, we'll have protocols like HP's Jetsend which will allow an over-the-wire protocols for networks to replace it. But for now, we're still stuck with TWAIN and its issues.

Win98 introduced input event handling which is a nice idea but requires another manual step in handling scanning. One must choose the program. Then make choices about color, resolution, destination and whatever. The UI for these interfaces is ranges from painful to excruciating. One must wander down menus, wait for model operations, and there are all those settings and subsettings.

And, while I'm flaming, I'll complain about wizards and worms-eye-view interfaces that require one navigate through dialog boxes that make voice mail menus seem almost usable by comparison. Somehow, few so-called designers really understand the key idea of the spreadsheet – the ability to see everything at once. Again, the topic for an essay in its own right.

Handling multiple pages, two-sided documents are nightmares. Perhaps if everything were perfectly setup, then maybe their assumptions would work. Provided that I could figure out the UI. Pagis is one of the offenders here since I still can't figure out all the pieces.

And the number of pieces and elements are part of the problem since there are all these tools vying for control. Some emulate desktops, some try to be copiers and fax machines and some just place files on my desktop.

I had high hopes for the USB scanners such as the Storm (Nee Logitech) scanner. It is nice to be able to just plug it in but it seems to forget its settings. Also, when scanning a page I must not only insert the paper but press a button and the select which device should read the page.

I do have a UMax sheetfeed which isn't that bad but just tock a long time to start scanning each page. The design point for so much of these is more artist than document manager.

So, in the triumph of hope over experience, I bought the HP 6250.

The HP 6250 Scanner

I'm singling this out because I had high hopes but ultimately become more aware of its limitations than its virtues.

I want to be able to feed multiple pages at a time so I don't have to get RSI feeding paper and waiting. Perhaps I should have learned my lesson with the HP-5S which had a very very bad sheet feeder and involved the usual UI steps when it asked me questions every time I tried to fax a document.

But the 6250 looked great since it was a flatbed copier that came with a sheet feeder.

The good news is that the sheet feeder actually (generally) works. As long, that is, as one has 8.5 x 14 single-sided pages. The other cases are problematic.

But before I get to that, I wasted a lot of time trying to get it to work since it gets totally befuddled by the presence of any other scanners or scanner software on the machine and just does nothing until one removes the 6250 from the USB bus then it appears and complains about the loss of the scanner. Through painful rituals involving plugging and unplugging the scanner, running their program again, one can get it to recognize the scanner. For simplicity, I connected it to my pristine (or at least a scanner-virgin) laptop.

Of course, HP tech support seriously suggests that I buy a separate PC for each scanner. Typical of the impotent mindset of tech support.

But even then, well, there are so many annoying things I'll just bullet-point them:

  • The scanner does a poor job with narrow paper. The pages twist and often jam.
  • If the paper is short, such as only 11 inches long, it produces large areas of gray around the page. On disk, these add a few hundred thousand bytes per page!
  • More generally, there seems to be absolutely no awareness on HP's part that one can automatically crop a page. The background of the device is fully under HP's control and must be designed to support this! Instead, they choose a gray color that the scanner things is part of a document. This is idiotic.
  • If there is paper on the glass below the sheet feeder, the scanner will simply keep scanning the one line that appears below the sheet feeder's scanning slot.
  • When one does scan a pile of paper, each page becomes a separate document! The file name of the first file is used for the first page and then subsequent pages are numbered with the number as a suffix. But since there are no leading zeros, sorting by file name doesn't work. If one continues the scan, you must use a different base name since the nondesigners are totally clueless and don't have the concept of appending to the series!
  • The scanning, and sheetfeeding options, must be set anew each time. And HP still thinks it is always scanning color images so I must manually find the field (no shortcut keys?) to set it to BW and then change the DPI to 200 form 300 since 200 is sufficient. And then I must choose the file name and the format. If I make a mistake of setting TIF, it is uncompressed. I need to remember to scroll down to compressed each time. Not that it does a great job of compression. And all that gray defeats it. There is also no threshholding so that all the background in a bank document results in fuzz.

The good news is that it does feed sheets but, in the end, the overhead for dealing with these far exceeds the benefit.

This is an example of a potentially great product severely damaged by the lack of common sense.

Updated comments on the HP 6250

Just had a relatively good experience with the 6250. I asked it to read documents as"text" and it did feed a pile through and did a pretty good job of OCRing them. I did a very similar document using the Visioneer scanner and their default OCR. HP uses Omnipage (Caere). The text on both was fairly similar except that HP had the very annoying habit of leaving out spaces between words. It should've been able to make more effective use of a dictionary for identifying word boundaries.

But it is far easier to put a pile in the 6250 and let it go than to hand feed each page to the Visioneer, especially since I use the slower, higher resolution, mode. But HP does require more fixup. It also loses formatting which is a mixed blessing since Visioneer's OCR does format but then one needs to undo the special markup if one wants to use proper word styles.

There is also a setting called "whole page" which also OCR's the text and, in addition, tries to preserve the formatting. Unfortunately, it does this by making extensive use of frames which do indeed keep the position of elements in the original document but frustrate attempts to clean up the document since they preserve the physical but not the logical structure of the document. And there were no obvious ways to override the default settings. Idiot-proofing can result in a capability only suitable for idiots.

Quick Notes

These are comments which didn't fit nice above.

  • TIF. The idea of a Tagged Image File Format is nice. One can include rich information within the file as to the details of the contents and include multiple parts. Alas, it suffers from the fatal flaw that so much of OOPs has, it relies and tools that must look inside the file to know what to do. As bad as using three letter file suffixes for the file type, it allows one to know, for example, that a file contains mutipaged compress faxes as a drawing program such as MGI's Photosuite doesn't seize the document suffix and crash whenever you want to read a FAX.
  • While USB devices are a start, I really need all these devices on the network as resources. There are high end network faxes and other devices but they are even more important for household computing where one shouldn't have to dedicate machines to own peripherals.