Project Gutenberg was founded by Michael S. Hart in 1971 in an effort to convert printed works of the public domain into electronic book (e-book) form and make them available to readers at little or no cost. The project was named after the 15th century German inventor Johannes Gutenberg who famously invented the movable-type printing press. He created the first printed version of the Bible, thus making it available to a much larger number of people than ever before.
The project depends on hundreds of volunteers who contribute their time to the conversion from printed text into digital book form. More than 60,000 books have already been converted by the project and are available for free download at https://www.gutenberg.org/ebooks.
The actual conversion process is done by Distributed Proofreaders, an organization closely linked to Project Gutenberg. It starts with the scanning of each printed page of a given book. Only books that are in the public domain and no longer copyright protected are being converted. An OCR reader interprets the page scans and creates digital text. While the OCR reader is quite sophisticated in recognizing printed text, it does not always interpret it correctly due to, for example, faint print or smudges on the page. This is when the human proofreaders are needed. They compare the scanned page images with the OCR output and correct the output accordingly. In preparation for the e-book formatting the proofreaders also remove page numbers, headers and footers, and rejoin words that had been separated at the end of a line in the original text.
Each e-book goes through three rounds of proofreading by proofreaders with different experience levels, with the first round requiring the lowest level. It is followed by two rounds of formatting and one round of post processing. In a final step the e-book is committed to the Project Gutenberg library.
For each round of processing there are specific written instructions for the proofreaders, as well as tutorials and exercises. Any change that a proofreader makes to the scanned text is automatically documented and can be verified and possibly corrected by the next round of processing. The primary rule for the conversion process is to “not change anything that the author wrote”, even if there are oddly spelled words, strange punctuation, or even outrageous statements. The stated goal is to preserve the integrity of the book content and keep it as the author intended it to be.