This issue is an attempt to review some of the products I've been using. I shouldn't
add the usual caveat but will one the purposes of these "columns" is to
give me a chance to do writing without the full coherence I'd normally devote to an essay.
I do hope to review and organize these into more coherent essays in the future.
This issue will focus on products I try to use to deal with the paper world. These are
personal document management tools rather than corporate ones. But there are some
similarities as we move from using a single computer to more "household" and
group activities. Since this is more a review, I won't delve into my wishlist except to
note that these are accommodations. I'm looking forward to getting more of my information
electronically as data rather than pictures.
But even where I can get documents like bank and credit card statements online, the
paper versions often have additional information such as the images of checks and payment
details (for American Express). More important, is that they represent reconciliation
units rather than long lists of undifferentiated transactions online. Some providers such
as Webcard and Amex do provide statement by statement organizations but they are still the
exception.
The good news it that online availability is growing. But the design point is still the
eyeball rather than the program. But I've been using the browser interface to write
programs that extract the data anyway on the assumption that they are not going to change
the presentation structure very often and they there are clues in the presentation markup
that allow me to identify the data. Still, it is far from perfect.
I've long been a user of Visioneer's products. So far, I still see them as the best of
the breed but that is not the same as perfect. The latest version, Visioneer Explorer
comes closer but also represents a step backwards in some aspects.
One of the precursors to this category was Wang's Freestyle product from the late
1980's that made it easy to scan documents into the computer and view them as thumbnails.
One fatal flaw was the lack of much integration with the existing platform. The machines
were still weak, slow 386's and 286's and Windows was still not quite there. Overall, a
nice concept but not over the threshold.
Visioneer's key innovation was to use a inexpensive scanner and have a desktop program
that automatically responded when paper was inserted. The documents are represented as
thumbnails on the desktop and are sufficiently large to allow for identification and some
organization.
One problem was that their storage system used its own naming convention, perhaps due
to the historic limitations of Windows 3.1. But it took till Visioneer's Explorer (VE) to
rectify this and use standard file names.
On the positive side, it was simple and quick to manage and edit documents. It also had
links to programs and utilities that made it simple to organize and edit the documents as
well as OCR, print (and thus Fax) them. Acquisition is enhanced by the ability to print to
the desktop.
VE not only uses the native file system but allows one to view documents in arbitrary
directories. But along with the improvements came a big step backwards. Instead of a fast
and integrated viewer, the viewer is an external program that is slow to start. More
serious one can't rename a document while viewing it. This seems minor but it
important to be able to quickly give documents effective names.
VE still prefers ".max" files but now allows more use of native file formats
such as TIF but support for nonMax files is not as smooth as the native format.
Unfortunately Visioneer seems to have saved money on support. The support staff, if
they respond, tends to treat customers as idiots and any question is answered with a
tutorial for idiots. This has made it difficult for me to work around problems in their
software and to make suggestions. Still, the use of real file names in VE goes a
long way to allowing me to resolve problems and also to do effective backup.
Some quick suggestions, should anyone at Visioneer read this:
- Make the viewer work more closely with the desktop so as to allow renaming.
- Allow a bigger thumbnail view and improve viewing on TIF files. This is important for
quickly rearranging and reorganizing documents. Or, at least, allow me to quickly magnify
the page number. This is especially important for handling double-sided documents
- Allow a multipage file to be viewed as a subfolder rather than one page at a time in the
viewer.
- Allow multiple desktops to make it easier to manage directories.
- Support directories on the network as well as locally.
- Debug Twain support so as to accept a wider variety of scanners and don't handle events
for scanners you don't support.
On the positive side it has some nice features
- As noted, the thumbnails are generally large enough to identify the document.
- It support "printing" to the desktop which makes it easy to prepare a document
for faxing but manipulate it first.
- The scanner does automatically launch the program which makes it very quick to work with
it.
- The stack and unstack capabilities make it possible to quickly organize documents and
assemble them out of scanned pieces.
- I use the Black & White scanner so don't need to deal with too many choices when I
want to scan. The 200 DPI mode makes it simply to just scan and, if I want, copy or fax a
document without having to make too many decisions. It also makes scanning fairly quick.
I did have a little experience with Visioneer for the Mac but it was too clunky to use.
The Mac certainly seems to be showing its age in many ways and, as a backwater, doesn't
get the necessary attention from product developers.
This is a program from Xerox and an outgrowth of the acquisition for Ray Kurzweil's
reading machine company.
The current version (2.0) uses the standard file system folders. One can designate a
folder hierarchy as being handled by Pagis. It is similar to working with a standard shell
folder extended with Pagis tools and their thumbnail viewer.
This was a major advantage over Visioneer prior to VE.
The problem is that Pagis is still clunky and their emulation of shell folders is
problematic and often buggy.
Pagis uses its own XIF files but can important/export TIF and other format. Since I
don't want to be locked in, I try to stick with TIF but lose some of the more useful
capabilities.
The Pagis viewer is more powerful than the Visioneer one but that comes at a price of
making simple things awkward. It is also very slow to start up, of often resulting in a
pile of viewers appearing all at once.
I expect many of these problems to be addressed in the future but with VE, Pagis' major
advantage of working with the native file system is not as strong as it used to be.
Whereas VE maintains their own viewer, the Pagis shell emulation becomes a liability any
bugs and design flaws mean that all shell access to the documents is compromised.
I've only glanced at others. The general problem is that they are like earlier versions
of Visioneer in that they manage their own desktops without allowing access to the rest of
the file system and show less finesse in document management.
Still, if others have suggestions, I'm interested in looking at them.
A legacy part of document management is the Fax. While the Fax modems use a different
modulations scheme from normal data modems, the bits themselves are in a standard format,
usually Group 3 or Group 4 compression. The transition to transmitting them view the
Internet was obvious. (BTW, I integrated Fax with Lotus Notes in 1989 so got into the
technical details fairly deeply).
I use a number of services:
-
JFax. This seems to be the premier service. It offers
inbound phone numbers in a number of cities around the world. In addition to Fax, it
supports voice mail which comes in as a GSM attachment. One can use the web to manage
one's account. There is also the JFaxsend for outbound faxing but since it lacked NT
support, I haven't used it as my primary outbound service. Once nice feature of their
outbound support, is that the printer drive also allows me to print to a TIF file, even
under NT. A very useful capability.
-
Faxmission (AKA Faxnet).
This is a Boston-based service that I learned about from Mediaone. It supports inbound and
outbound faxing. I use it as my normal outbound service since it has drivers for Win95 and
(now) NT and connects directly to their server without using my mail system for a
transport. It is reliable and simple so I've gotten used to it as my default.
-
Faxaway. Faxaway was the first outbound service and
still useful because it is simple. Just send an email message to the fax number with,
optional, attached files. It can convert a number of PC file formats and also offers a
printer driver that produces TIF files. One can setup an account online and pay for only
those faxes sent. It's still a good entry level service. The main reason I switched away
was Faxnet's ability to print and send in a single step.
The actual service is fairly simple so it's a commodity business. There are many
advantage to faxing online, especially in conjunction with document management software.
- Since I use document management software, the ability to treat a fax as just another source or output device makes life much easy.
- Directly "printing" to the fax produces much higher quality than printing and scanning.
- PC scanners are much higher quality than the scanners built into fax machines.
- Forwarded and annotated faxes don't lose any quality.
- No need for a phone line and, as with JFax, one can maintain phone numbers in a choice of cities.
- Price? Cheaper except for local faxes but the easy of use makes up for the cost of local faxing.
There are other benefits for heavy Fax users, but my focus is on personal use.
I use a number of scanners for various purposes. I still have a preference for the
Visioneer Vx (sheet feeder) since it is so simple and efficient to use. But it does have
its limits so I need to look further. Since the focus here is one document management, I
won't discussion the HP Photosmart and the Minolta Dimage (perhaps at another time).
One of the problems with scanners is that TWAIN is a kludge. That's just a given since
the designers recognize that also. Ideally, we'll have protocols like HP's Jetsend which
will allow an over-the-wire protocols for networks to replace it. But for now, we're still
stuck with TWAIN and its issues.
Win98 introduced input event handling which is a nice idea but requires another manual
step in handling scanning. One must choose the program. Then make choices about color,
resolution, destination and whatever. The UI for these interfaces is ranges from painful
to excruciating. One must wander down menus, wait for model operations, and there are all
those settings and subsettings.
And, while I'm flaming, I'll complain about wizards and worms-eye-view interfaces that
require one navigate through dialog boxes that make voice mail menus seem almost usable by
comparison. Somehow, few so-called designers really understand the key idea of the
spreadsheet the ability to see everything at once. Again, the topic for an essay in
its own right.
Handling multiple pages, two-sided documents are nightmares. Perhaps if everything were
perfectly setup, then maybe their assumptions would work. Provided that I could figure out
the UI. Pagis is one of the offenders here since I still can't figure out all the pieces.
And the number of pieces and elements are part of the problem since there are all these
tools vying for control. Some emulate desktops, some try to be copiers and fax machines
and some just place files on my desktop.
I had high hopes for the USB scanners such as the Storm (Nee Logitech) scanner. It is
nice to be able to just plug it in but it seems to forget its settings. Also, when
scanning a page I must not only insert the paper but press a button and the select which
device should read the page.
I do have a UMax sheetfeed which isn't that bad but just tock a long time to start
scanning each page. The design point for so much of these is more artist than document
manager.
So, in the triumph of hope over experience, I bought the HP 6250.
I'm singling this out because I had high hopes but ultimately become more aware of its
limitations than its virtues.
I want to be able to feed multiple pages at a time so I don't have to get RSI feeding
paper and waiting. Perhaps I should have learned my lesson with the HP-5S which had a very
very bad sheet feeder and involved the usual UI steps when it asked me questions every
time I tried to fax a document.
But the 6250 looked great since it was a flatbed copier that came with a sheet feeder.
The good news is that the sheet feeder actually (generally) works. As long, that is, as
one has 8.5 x 14 single-sided pages. The other cases are problematic.
But before I get to that, I wasted a lot of time trying to get it to work since it gets
totally befuddled by the presence of any other scanners or scanner software on the machine
and just does nothing until one removes the 6250 from the USB bus then it appears and
complains about the loss of the scanner. Through painful rituals involving plugging and
unplugging the scanner, running their program again, one can get it to recognize the
scanner. For simplicity, I connected it to my pristine (or at least a scanner-virgin)
laptop.
Of course, HP tech support seriously suggests that I buy a separate PC for each
scanner. Typical of the impotent mindset of tech support.
But even then, well, there are so many annoying things I'll just bullet-point them:
- The scanner does a poor job with narrow paper. The pages twist and often jam.
- If the paper is short, such as only 11 inches long, it produces large areas of gray
around the page. On disk, these add a few hundred thousand bytes per page!
- More generally, there seems to be absolutely no awareness on HP's part that one can
automatically crop a page. The background of the device is fully under HP's control and
must be designed to support this! Instead, they choose a gray color that the scanner
things is part of a document. This is idiotic.
- If there is paper on the glass below the sheet feeder, the scanner will simply keep
scanning the one line that appears below the sheet feeder's scanning slot.
- When one does scan a pile of paper, each page becomes a separate document! The file name
of the first file is used for the first page and then subsequent pages are numbered with
the number as a suffix. But since there are no leading zeros, sorting by file name doesn't
work. If one continues the scan, you must use a different base name since the nondesigners
are totally clueless and don't have the concept of appending to the series!
- The scanning, and sheetfeeding options, must be set anew each time. And HP still thinks
it is always scanning color images so I must manually find the field (no shortcut keys?)
to set it to BW and then change the DPI to 200 form 300 since 200 is sufficient. And then
I must choose the file name and the format. If I make a mistake of setting TIF, it is
uncompressed. I need to remember to scroll down to compressed each time. Not that it does
a great job of compression. And all that gray defeats it. There is also no threshholding
so that all the background in a bank document results in fuzz.
The good news is that it does feed sheets but, in the end, the overhead for dealing
with these far exceeds the benefit.
This is an example of a potentially great product severely damaged by the lack of
common sense.
Just had a relatively good experience with the 6250. I asked it to read documents
as"text" and it did feed a pile through and did a pretty good job of OCRing
them. I did a very similar document using the Visioneer scanner and their default OCR. HP
uses Omnipage (Caere). The text on both was fairly similar except that HP had the very
annoying habit of leaving out spaces between words. It should've been able to make more
effective use of a dictionary for identifying word boundaries.
But it is far easier to put a pile in the 6250 and let it go than to hand feed each
page to the Visioneer, especially since I use the slower, higher resolution, mode. But HP
does require more fixup. It also loses formatting which is a mixed blessing since
Visioneer's OCR does format but then one needs to undo the special markup if one wants to
use proper word styles.
There is also a setting called "whole page" which also OCR's the text and, in
addition, tries to preserve the formatting. Unfortunately, it does this by making
extensive use of frames which do indeed keep the position of elements in the original
document but frustrate attempts to clean up the document since they preserve the physical
but not the logical structure of the document. And there were no obvious ways to override
the default settings. Idiot-proofing can result in a capability only suitable for idiots.
These are comments which didn't fit nice above.
- TIF. The idea of a Tagged Image File Format is nice. One can include rich information
within the file as to the details of the contents and include multiple parts. Alas, it
suffers from the fatal flaw that so much of OOPs has, it relies and tools that must look
inside the file to know what to do. As bad as using three letter file suffixes for the
file type, it allows one to know, for example, that a file contains mutipaged compress
faxes as a drawing program such as MGI's Photosuite doesn't seize the document suffix and
crash whenever you want to read a FAX.
- While USB devices are a start, I really need all these devices on the network as
resources. There are high end network faxes and other devices but they are even more
important for household computing where one shouldn't have to dedicate machines to own
peripherals.