Path: utzoo!attcan!uunet!wuarchive!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!CITI.UMICH.EDU!dave
From: dave@CITI.UMICH.EDU (Dave Bachmann)
Newsgroups: comp.protocols.tcp-ip
Subject: Decrypting RFC 1125
Message-ID: <8911240620.AA06208@ucbvax.Berkeley.EDU>
Date: 24 Nov 89 05:27:00 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 112

If you're like me, you don't have a Postscript printer at home, but you also
don't want to wait until Monday to read the latest RFC, RFC1125.  Well, you're
in luck.  I finally decided "The heck with having an RFC I can't read!" and I
hacked up an awk script to decompile the Postscript in RFC 1125 into relatively
readable text form.  Of course, there were a few little details, like the back-
to-front printing of the document, which meant I got page 18 before page 17, 
and this weird business of representing "ff" by \013, "fi" by \014 and so on.
So, it's not the elegant script I had hoped for. But it works.  The script is
available for ftp on citi.umich.edu as pub/unps.awk.  You'll also want the file
cleanup.sed, which takes care of the \013 business, as well as parenthesis
quoting.  I'll also give these two at the end of this message, since they're
so short.  Warning: this is all very empirical, and full of magic numbers.  To
produce a useable file from rfc1125.ps, first do "awk -f unps.awk rfc1125.ps".
This will produce the files "page18" through "page9" and then die complaining
about only being able to write to 10 files.  So now do "awk -f unps.awk limit=8
rfc1125.ps", which tells unps.awk to skip any pages > 8. It now has produced
"page8" down to "page1". So now "cat page? page?? | sed -f cleanup.sed > 
rfc1125.txt" and you're done.  I've also put the result in pub/rfc1125.txt for
those who are impatient.
  After I had gotten this working I excitedly looked to see if it would work
for the other Postscript RFC's.  No such luck.  EVERY AUTHOR OF A POSTSCRIPT
RFC HAS USED A DIFFERENT PACKAGE.  In fact, the only RFC's that share a common
format are the NTP family.  Oh well.
  Here they are:
---------
unps.awk
---------
#	This script tries to decompile a Tek-produced Postscript document
#	and produce a file for each page.  This is necessary to handle
#	documents that print back-to-front.  Each page goes into a file
#	named "page<n>" where n is the page number.
#	There are a lot of magic numbers here.  Trial and error.
#
#	Track current page number
#	Specified as "<n> @bop1" where n is the new page number
#
$2 == "@bop1" { oline = 0
                pagenum = $1
                line = "" }
#
#	Since awk can only write out to 10 files, we need a way to
#	skip the first n pages before starting to write to files.
#	To process only pages prior to page x, invoke with "limit=x"
#
{ if (limit+0 > 0 && limit+0 < pagenum+0) next }
#
#	Lines of the form "<n> r (<string>) s" are moving n points right
#	and writing string.  I'm mapping a space to every 25 points, starting
#	at 5 and above.
#	Lines of the form "<n> r <m> c" are moving n points right and writing
#	the ascii character m.
#
$2 == "r" { dots = $1
            while (dots > 5) { dots = dots - 25
                              line = line " " }
	    if ($4 == "s") { token = $3
			     wordl = length(token) - 2
			     word = substr(token,2,wordl)
			     line = line word }
	    else line = line sprintf("%c", $3) }
#
#	Lines of the form "<x> <y> p <stuff>" are positioning to coordinates
#	x,y on the page and doing something.  If stuff ends in "ru" it's
#	drawing something, so ignore it.  Otherwise find out how much the
#	y coordinate has changed and map that to newlines.  I'm mapping a
#	line to every 48 points, starting at 30.  This is where we print out
#	the previous line that we've been building.
#
$3 == "p" { if ($6 == "ru") next
            ldiff = $2 - oline
            oline = $2
            while (ldiff > 29) { ldiff = ldiff - 48
                                 print line > "page" pagenum
                                 line = "" }
            if ($5 == "s") { token = $4
	                     wordl = length(token) - 2
	                     word = substr(token,2,wordl) 
	                     line = line word }
            if ($5 == "c") line = line sprintf("%c", $4) }
#
#	Sometimes it just writes a string without positioning.
#
$2 == "s" { token = $1
            wordl = length(token) - 2
            word = substr(token,2,wordl)
            line = line word }
#
#	Sometimes it just writes a character without positioning.
#
$2 == "c" { line = line sprintf("%c", $1) }
#
#	End of the page.  Print the previously built line, if any.
#
$1 == "@eop" {print line > "page" pagenum }
#
#	That's all.
---------
cleanup.sed
---------
s/\\013/ff/g
s/\\014/fi/g
s/\\015/fl/g
s/\\016/ffi/g
s/\\(/(/g
s/\\)/)/g
---------

Dave Bachmann                                   |  dave@citi.umich.edu
Center for Information Technology Integration   |  {mailrus,rutgers}!citi!dave
University of Michigan                          |  (313)998-7693 or 8-7479

P.S.  Happy Thanksgiving