Xref: utzoo comp.sys.next:542 rec.arts.books:4367 Path: utzoo!utgpu!attcan!uunet!husc6!ukma!tut.cis.ohio-state.edu!uccba!uceng!dmocsny From: dmocsny@uceng.UC.EDU (daniel mocsny) Newsgroups: comp.sys.next,rec.arts.books Subject: Re: Hundreds of books on an optical disk Summary: If we can't build machines that work in the real world... Message-ID: <382@uceng.UC.EDU> Date: 3 Nov 88 20:05:07 GMT References: <0XMtqn087E-0A14EYk@andrew.cmu.edu> <344@uceng.UC.EDU> <5821@hoptoad.uucp> Organization: Univ. of Cincinnati, College of Engg. Lines: 60 ...then let's build a world our machines can work in. 100 years ago people were trying to replace the horse with internal combustion engine-driven vehicles. Now the obvious approach would have been to build some sort of mechanical analog of the horse, strap an engine on it, and keep everything transparent to the users. Since that was not possible, the next easiest thing was to change the world to accommodate the strength/weakness mix of the best way to run engines: on wheeled chassis. So we put $ billions into paving over some of the best real estate in the country. Now we have a world that accommodates motor vehicles, to some extent. In article <5821@hoptoad.uucp>, tim@hoptoad.uucp (Tim Maroney) writes: > So $50*671 = $33,500. Not a trivial investment. This is the cost to the > publisher of making the book, though it would be spread out among the > individual copies. And that's still not factoring in the OCR running and > proofreading, not to mention pre-mastering and mastering and duplication. > And promotion and.... > > It'll always take proofreading, and for 671 books that's quite > a lot of skilled labor to pay for. Let's not forget that virtually every book that makes it into print these days passes through a computer at some stage in its production. Most authors use word processors (either directly or through secretaries), most publishers use electronic typesetting, and some of us authors dabble in both. So most of the work the CD-ROM publishers have to do has already been done somewhere. Printing books degrades the utility that was present when that information was originally in electronic form. From the standpoint of the CD-ROM vendors and potential users, publishers and authors who release information in printed form exclusively are destroying wealth. By refusing to establish and adhere to electronic document standards, we are reducing the amount of information we can exploit and pass on to our progeny. In other words, we are shooting ourselves in the foot. A world optimized for horses was no good for automobiles. The latter was useless until a new world was built. Similarly, a world optimized for paper is no good for computers. To get the most benefit out of our new technology, we need to change the way we do things. Obviously the existing stock of printed information will not benefit from re-designing our world to match the strengths and weaknesses of computers. But I would hesitate to say that OCR will _always_ require proofreading. OCR is a hard problem, but certainly not an impossible problem. It is only a mapping from the (very large) vector space of possible letter bitmaps to the smaller space of letter codes and font descriptions. The structure of that mapping is complex, but not infinitely so, else we could not read. Connectionist approaches to OCR are already showing great promise. In ten years it might be essentially a solved problem. A harder problem will be to have a computer make sense of arbitrary figures and diagrams. But that won't be necessary; the OCR machine can simply vectorize or bitmap anything it can't otherwise interpret. Give a smart OCR device, we could ``mine'' libraries for their information content. Just load the hopper with books, press the button, and take the information out of those mouldering tombs and put it in the hands of people who can go out and create wealth with it. Dan Mocsny