2007年12月7日星期五

Document Imaging and Processing Typically Go Together

Document Image Processing can be for different purposes.

For example, the processing might be nothing more than cleaning up the document. Typical documents often contain punch holes, black borders, undesired lines, and so on. There are document-cleaning tools that can remove these from the document images after they are scanned. Document cleaning software can also allow users to specify what to do about such elements in scanned images.

Other kinds of cleaning up include:
  • Straightening askew images
  • Removing borders that exceed given noise-tolerance specifications
  • Smoothing nicks and bumps distorting scanned text characters
  • Converting white text on black to black text on white


These and other cleaning tools can be automated by specifying minimum and/or maximum sizes of the elements to be removed.

Major Image Processing Tasks

In the case of text documents, document imaging produces images that humans can read, but machines can't. For making these documents searchable by using the typed words, the text characters on the images need to be converted into a machine-readable format.

This conversion is done using technologies such as OCR (Optical Character Recognition) and ICR (Intelligent Character Recognition). Even hand-printed characters can be recognized to some extent by these technologies.

This kind of conversion is also needed for the purpose of making the document images editable.

Once the images of text documents have been made machine-readable, the next, typical document imaging process is to index them. Indexing makes the documents searchable. Full-text indexing makes them searchable by any word in the document.

Full-text indexing takes up lots of storage space and an alternative is to index by tags and meta descriptions. Tags are words that typify the document's content. Descriptions give short summaries of the content.

The processing of the document images can go even further. Based on programmed specifications, the documents can be categorized and stored in appropriate repositories.

In short, document image processing can facilitate content management by converting paper documents into categorized content ready to be queried by users, all in a matter of minutes with minimal human intervention.

There are mailroom processors that can extract documents from envelopes and then go on to process the documents as above. With this kind of a facility, a single operator can manage mail volumes formerly handled by many clerks, and also go significantly further in the content-management process than the clerks.

Conclusion

Document imaging and processing typically go together. The processing can be simple tasks like removing undesired elements such as distortions and black borders or complex tasks such as converting text images into machine-readable characters and indexing the content.