We have implemented many document imaging (and larger ECM solutions) projects some which are managing several million documents. There are several factors that contribute to the success of a document scanning project and which can also form the basis of your decision on whether to conduct this in-house or contract it to an external entity. Briefly, I would recommend you consider these issues for your decision making:
Capture: The process involved in digitizing paper documents. This is not as simple as putting a paper on a scanner and pressing a button. Capture begins by taking a very close look at the existing folders and documents that you wish to scan. Are the documents of different sizes? Are they all clearly legible? Do you have both color and B&W documents? Are there photographs that must also be scanned? How much document preparation) is required (e.g. removing staples, sorting, etc)? Do you want to scan all documents within a folder as 1 file or do you need to break them into different sections? For scanning process automation, it could be useful to utilize separator pages or cover sheets with barcodes that make the scanning task easier on the person assigned to run the pages through the scanner.
Indexing: Ok, we have thousands of pages scanned, now what? Indexing refers to the assignment of keywords/values (also known as metadata) that makes it possible to quickly locate the desired document. The actual keywords to use depends on the types of files in your filing cabinets but typically you may have things like Project Name, Invoice Number, Department, Date etc. An experienced system integrator may find ways to automate the capture of the metadata otherwise someone will need to key it in by hand for each scanned document. Methods of automation include the use of barcodes and OCR. If OCR is to be used then I would advise scanning at a resolution of 300 dpi.
Another indexing issue is whether you need full-text searching later on or not. Full-Text search allows you to search by any word that appears anywhere on any page. To implement this, a good OCR solution will recognize each scanned page and store the captured content in a suitable database.
Search: All the above effort is useless if the solution’s search capabilities are poor or if it takes a long time to find and retrieve a document. Look for a system whose search capabilities correspond with your company’s business needs for the scanning project. Generally, the solution should allow yo to search by any of the metadata you captured, or by any word in the contents, or a combination of both.
If this sounds more complex that you you think is needed for your project, then perhaps a simple scanning tool that automatically generates searchable PDF files is all you need. You can then save the PDF files somewhere with a couple of keywords and that might be good enough for basic requirements.
The above refers to “back file conversion”, i.e. converting existing paper documents to digital form. What about going forward? Such conversion are typically a part of a bigger process that aims at digitizing and automating paper processes and perhaps extending that to also include application files, email, multimedia and other digital assets.