Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!usc!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!gatech!bloom-beacon!eru!hagbard!sunic!mcsun!hp4nl!dutrun!duteca!dutirt2!alex From: alex@dutirt2.tudelft.nl (Alexander Vonk) Newsgroups: comp.ai Subject: Re: Optical Character Recognition, how? (Curious) Message-ID: Date: 10 Dec 90 12:06:23 GMT References: <50761@bsu-ucs.uucp> <985@gtenmc.UUCP> Sender: news@duteca Lines: 42 It has been some three or four years ago that I was studying the field of OCR, but I guess what I know of it is still partly applicable. In <985@gtenmc.UUCP> maf@gtenmc.UUCP (Mary Ann Finnerty) writes: >Since the scanner just recognizes type-written or computer >print-out for input, the characters it must recognize are easily >(I think) to catagorize. Well, it depends. It's much more difficult with a printer or type writer that does not align its characters very well. Skewed characters are much harder to recognize if you expect `straight' characters. One time, I attended a demo at Canon Nederland (in the Netherlands, that is). I was very curious (too) about the scanner and OCR software that was demonstrated and asked the sales rep to scan a letter somewhat rotated, so it would be hard to recognize. I don't remember the exact figures, but the error rate went up a factor five or ten; from 2 or 3 errors in the well-scanned letter to 20 or so errors in the not that well-scanned letter. Of course, this letter was printed on their Canon laserprinter with one standard font. >My question is this, are there scanners out there that improve >as they go along? For example, that get input somehow confirming >correct character recognition so that they add to their >repertoire (sp?) of fonts that they can recognize? According to the Canon sales rep, their OCR software could improve its recognition abilities by recognizing text written with the same fonts over and over again (Much like you can learn to read even the most horrible handwriting). >I don't know if I expressed this very well, but it seems like a >good application for a self-learning program, but I'm not sure >how the responses could be confirmed or rejected...Curious. Personally, I think that this should be a good opportunity to couple a spelling checker with the OCR software. Of course, you should be very careful about spelling errors in the letter itself, but an OCR program will calculate an estimate of how good it recognized a certain character. Maybe such a program could `learn' by just scanning the complete printed characterset one time. Alexander Vonk. +++ Alexander Vonk - Technical Univ. Delft, Netherlands +++ +++Phone: (NL) 015 - 78 64 12 (world) 31 15 78 64 12 +++ +++Mail: alex@dutirt2.tudelft.nl or alex@dutirt2.UUCP +++