Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mandrill!gatech!ncar!ames!pioneer!eugene
From: eugene@pioneer.arpa (Eugene N. Miya)
Newsgroups: comp.graphics
Subject: Re: Optical Character Recognition software?
Keywords: OCR Sun rasterfile
Message-ID: <10027@ames.arc.nasa.gov>
Date: 8 Jun 88 18:36:18 GMT
References: <367@msn006.misemi>
Sender: usenet@ames.arc.nasa.gov
Reply-To: eugene@pioneer.UUCP (Eugene N. Miya)
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 25

I've been looking at several OCRs.  DEST, Kurzweil and others.
I've developed a small "benchmark" [dirty word] to test these systems.
Unfortunately, I can't post it as it varies point sizes (6-24 point
sort of like an eye chart), font types (Times and Courier for instance),
character spacing (variable and constant).  This problem is tough, you have
to distinguish between 0 O <Oh and zero, or did I type zero and oh?>
	l and 1 (ell and one)
You can easily think of one, just take a bunch of text and numbers,
special characters, and see how well the thing reads them in.  Oh,
commas and periods are also trouble.  The ell and one problem is
particularly annoying because we have older secretaries who started
using the ell as 1.  If you see some sci.space forwardings, there's
a secretary at HQ who does this.  Believe me, OCR has a long way to go.

I've not developed quantatitive estimates how much these systems will
save, but their utility is currently marginal.
It depends on your application (what you are reading in).

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  {uunet,hplabs,ncar,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene
  "Send mail, avoid follow-ups.  If enough, I'll summarize."