Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mandrill!gatech!ncar!ames!pioneer!eugene From: eugene@pioneer.arpa (Eugene N. Miya) Newsgroups: comp.graphics Subject: Re: Optical Character Recognition software? Keywords: OCR Sun rasterfile Message-ID: <10027@ames.arc.nasa.gov> Date: 8 Jun 88 18:36:18 GMT References: <367@msn006.misemi> Sender: usenet@ames.arc.nasa.gov Reply-To: eugene@pioneer.UUCP (Eugene N. Miya) Organization: NASA Ames Research Center, Moffett Field, Calif. Lines: 25 I've been looking at several OCRs. DEST, Kurzweil and others. I've developed a small "benchmark" [dirty word] to test these systems. Unfortunately, I can't post it as it varies point sizes (6-24 point sort of like an eye chart), font types (Times and Courier for instance), character spacing (variable and constant). This problem is tough, you have to distinguish between 0 O l and 1 (ell and one) You can easily think of one, just take a bunch of text and numbers, special characters, and see how well the thing reads them in. Oh, commas and periods are also trouble. The ell and one problem is particularly annoying because we have older secretaries who started using the ell as 1. If you see some sci.space forwardings, there's a secretary at HQ who does this. Believe me, OCR has a long way to go. I've not developed quantatitive estimates how much these systems will save, but their utility is currently marginal. It depends on your application (what you are reading in). Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "Mailers?! HA!", "If my mail does not reach you, please accept my apology." {uunet,hplabs,ncar,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene "Send mail, avoid follow-ups. If enough, I'll summarize."