Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!hplabs!hp-sdd!ucsdhub!jack!nusdhub!usenet
From: usenet@nusdhub.UUCP (USENET News Admin)
Newsgroups: news.admin
Subject: Re: Map expiration dates
Message-ID: <579@nusdhub.UUCP>
Date: 14 Jan 88 07:02:44 GMT
References: <2323@cxsea.UUCP> <7815@rutgers.rutgers.edu>
Organization: National University, San Diego
Lines: 280
Summary: Ugly, but interesting mapping possibility
	 for map generation.

In article <7815@rutgers.rutgers.edu>, pleasant@rutgers.rutgers.edu (Mel Pleasant) writes:
> How can it be done such that we
> never have to make a full one-shot posting of all of the files ever again?
> Well, let's make it slightly simpler ...  A change in the procedure *may*
> require another full posting just to get it started.  Ok, given that we do
> one more full posting, now answer the same question e.g. no more full
> postings beyond what's needed to start a new procedure....

Hi,
	I am a relative novice at net-mail who has basicly no experience
with pathalias [etc.]  So I am going to give an "how I would do it from
scratch" off the top of my head.  My spelling sucks, so 'where the
mess ;->  THIS TEXT HAS NOT BEEN PROOFED, AND MAY BE HAZER-DOS TO YOUR
MENIAL HELTH %^) [but the idea is serious, and, I think, quite good]

These are my "given"s:
	1)  Backbone sites re-route almost everything.
	2)  Backbone sites _always_ re-route something in the form:
		backbone!machine!user where machine is non-adjcent.
	3)  The maps at backbone sites are generally up to date.
	4)  A user will "never" send any mail to a site of which that
		user has never heard.  [no mail, no news, no phone
		conv. etc.]
	5)  If the conversation is by mail only, the user either knows
		the complete path, or at least one that works.
	6)  As news is expensive to pass around, each site admin tries
		to pass that news allong the cheapest thread possible.
	7)  No site is really concerned with how it got the mail, what
		each site is interested in is getting rid of the
		mail.
	8)  It should not be difficult to have a "dummy batch listing"
		created which could be shipped off to another program.
		In fact all that would really be necessary is for a
		copy of every "Path:" header to be piped to this
		mapper.

Here is the mold for a news-based self-mapping scheme.:

	Each site will maintain two somple lists.  These lists will be
saved in a way similar to the way "history" info is saved under news. 
[i.e.  the cheapest/most look-up-able way convenient]  the first list
will be a relativly short list of those sites which would like to be
considered backbones for mail, followed  by the "easiest way to get
there" spelled out as basic text  [i.e. site!site!site!site]  This
field will be built from the main map.

	The construction of the next list sounds very complex, though it
should, in fact be rather easy, though a little messy.  For the
purposes of the following I will assume infinite memory availability
as I have no idea what the actual memory requirements would be.  These
are the structures [in C] required:

struct la_list {			/* Linked list of array segments */

struct la_list	*next;		/* Link Hook */
struct site_d	*uss[10]; /* Array of pointers to up stream sites */
unsigned long int lnnum[10]; /* "Line Numbers" */
int weght[10];  		/* weighting index*/
	}

struct treent {			/* Tree node entry		*/

struct site_d	*less, *great;	/* Lt and Gt branch pointers	*/
				/* NOTE: Each step must be double
				   indirected [through the site_d
				   entry itself]	 */
	}

struct site_d {			/* Site Discription		*/

struct treent name, lnum;	/* Structures for 2 trees	*/
struct la_list *adjsys;		/* pointer to head of linked list
				   describing adjcent system	*/
char	*sysname;		/* Name of system		*/
int	weight;			/* Systems "frequency of apearence"
					portrayed as a "weight" */
struct la_list *bkrout;		/* Backwards routing info	*/
				/* for convience in jumping around */
				/* [possibly optional ?????] */
	}

Each line of the file looks like:

sysname	weight [line#	weight]...

i.e. One System name, followed by that systems "weight", optionally
followed by zero or more line-number/weight pairs.  [This assumes
ASCII text representation] followed by a newline character.

	Primary Loading:

	At the start of processing the old file is loaded into this
"infinite memory" by the following sequence:
	The file is read sequentally, filling up one site_d structure
per line.
	A number of la_list structures sufficient to hold all the line
number/weight pairs in the respective arrays are malloc(ed) and linked
into a list.  These are filled with values.  All the pointers are
set to NULL and the first free array cross-section [i.e. the same
index is always used on each aray so that element 2 of each of the
three arays refers to one conceptual lump. etc.] has it's weight entry
set to -1 [an ilegal weight] to signify the end of valid entries.
	When the entire file has been loaded, the tree structures are
used to resolve the line number entries in the la_list as pointers to
the approprate site_d structures.  Durring this stage, the la_list
pointed to by *bkroute is filled with the weight from the refrencing
*adjsys and a pointer to the refrencing site_d structure. [This builds
a "back track" of sorts, and may not be necessary.]
	[Logically each system entry now possesses a list of the
systems adjcent to it.  Because of the way this list is created, it is
a set of weights as to the "next" system "from here".  See the
routing method below]

	Processing updates:

	In general this system is self updating.  The "Path:" headders
from all the news articles are pumped through the update system.  The
easiest way to generate this information is to setup rnews/inews to
"rippoff" a copy of the "Path:" line and drop it in a file durring
processing.
	As the system scans the "Path:" line It will always pre-fetch
two strings beyond the one it is currently working on.  Each line is
assumed to start with a "dummy" entry representing the current system.
For all intents and purposes the program will only deal with two cases
1) str1!str2!str3
2) str1!str2

	In both cases the site_d->weight of "str1" is de-incremented
while in case 1, "str2" is operated on relative to "str1", that is
to say a) if necessary "str2" is issued a site_d, b) if necessary a
la_list entry is created in "str1"'s la_list with a weight of 800, c)
"str2"'s weight in "str1"'a la_list is de-incremented, d) if "str2"'s
entry in "str1"'s la_list is less than 100 it is re-set to 100.  In
Case 2 the processing ends because "str2" is most likely a user, or
something equally spesific and un-useful.
	Any time a new site_d is created it is murged into the linked
list of _names_ and given a weight of 800.
	As a special ditinction to the above NO site with a weight
greater than 1000 will EVER be de-incremented.  period.
	After all the "Path:"s have been processed, execution 
continues with "Closing Processing"


	Special Case Processing:
	At times the wounderful-world-of-mapping people will want to
inform the net of bogus and dis-continued sites.  To do this they will
issue a  special message, the body of which will be formatted as a
group of lines with the following format:

Sitename	[+|-]weight

where the [+|-] is infrequent.
	After a standard loading of the data file, the normal path
processing will comence.  On completion of the normal path processing
for each site "Sitename" the weight "weight" will be substituted in
the site_d->weight entry.  If there is a "+" or "-" the signed quantity
will be applied as a modifier.
	When increments are used, no weight will be incremented above
1000 nor de-incremented below 100.
	Certan numbers have special meaning.  See "Closing Processing"
and "Ageing"

	Closing Processing:
	To end the updating of the map and write the Map out to the
disk, you follow this procedure:
	1) A Dummy entry representing the local system is generated by
using the local-system-name, a site_d->weight of 0, the contents of the
/usr/lib/news/sys file, and a local routing file.  This entry
supercedes any entry already in place for the local system.
	2) The tree of site_d structures as refrenced by Sitename is
walked least-to-greatest. Each site with a non-negitave number is
given a new line-number [in order from 1 to whatever]
	3) The tree is walked again, by Sitename, and all the la_list
entries are sorted from least-to-greatest current weight.  Step 4 may
also take place durring this walk, but the sort MUST precede the write
for greatest efficency.
	4) The tree is walked again, by Sitename [as the line-number
tree is long since ruined] and the Sitename and weight, followed by
all the line-numbers [fetched by pointer, not from the local aray, as
the local aray may no longer be valid] and weights are output to the
new "Map" file.  IMPORTANT:!!! Nodes possessing a negitive weight are
NOT output.  They have no line number, and they are simpley skipped.
This effectively delets them.
	5) The tree and lists are walked, and freed.
	6) The update program exits.

	Ageing:
	Once a day, the map file is "aged" by adding one (1) to every
site-weight [column 2] which is greater than 100 and less than 1000.

	Weight Interpretation:

0		Local Site.
1-99		Official Bakcbones.
100		Backbone-by-courtisy [un-official, busy sites]
101-999		Normal Sites.
1000		Seveerly aged [generally dis-used] sites.
1001->		Restricted Sites.  [No mail/unreliable/etc]


	Figuring Routes:

	I have laid out a good mapping scheme, I do not presently
possess the math or algorythm for reducing the possible choices to
check, but the method of comparing one possible entry to another is as
follows:
	1) preform the standard starting load.
For each path, or as you calculate you must
	2) make a sum of:
	2a) The weight associated with each site multiplyed by an
		urgency factor [3 to 7 with 7 applied to a most
		important message, as this will increase "detail"
		in otherwise identical results by increasing the
		importance of the connection information]
	2b) The weight of the app link [in column 4, 6, 8, etc].
	3) The _LOWEST_ total is the best route.
	4) Unknown destinations should be routed to the closest
		backbone.

	Rationale:

	This system is based, not so much on the "cost" of the link,
but more on the "willingness" of each site to handle traffic.
Generally, news travels fastest by the cheapest link, and later
apearences of the same article at any site are ignored.  Adjcent
sites will tend to get _ALL_ the news from every site that even
sort-of feeds them, which is good, considering that direct mail
between adjcent sites should be encouraged.
	Because of this fastest-cheapest implication in the news
system, each site will not have to be informed about the "cost" of the
link explicitly.
	Because the mapping-gods can reset the weight of any site,
system wide, without "updating a map section" the network traffic is
reduced in all cases.
	Because the mapping-gods can delete a site by giving a single
negitive number, again we loose the "new map section"
	Because the map is based solely on electronic topology, and is
generally refrenced by line numbers, individuals can write relitively
simple programs to manipulate the information, using shell scripts and
"cut", "awk", and "sed"  Though these may not be as effecient,
sometimes they can be quite convient.  [for instance, extract a list
of all the leaf nodes becomes get all the entries with only 2 columns]
	Because there is a large range of ranking possible, but the
expression of same is quite visual, the "master maps" [As in the
current archives] can be compared to the system-generated in an easily
automated way [i.e. awk $2=1000 {print $1} | check_for_invalid_master |
send "where is your map entry" message]  Without costing the system in
general much in the way of accuracy and efficency.
	Because the system is self-mapping in general, people end up
with only the parts of the maps they are likely to need.  [i.e. if we
only ger the comp newsgroups, we probably arn't going to need to post
to a machine which only ever posted to rec newsgroups.]
	Because every group [or close enough to it] passes through a
backbone, and because the "master maps" should still be kept, when a
system needs to post to a machine which has never sent an article that
reached that machine, that machine can route to a backbone, and make
use of the master map [like systems can now].

Rob.
nusdhub!usenet
nusdhub!rwhite

Disclaimer:  This entire posting is off the top of my head.  I have
	gotten into the guts of any other mail/mapping system to date,
	nor have I been on the net long [5 months maby] so this can't
	be stolen.

Other Disclaimer:  I am not enough of a programmer [now] to implement
	this [now].  Programming, for me, [now] still takes me much to
	long to do the simple things, for me to get anything back out
	in a reasonable time.  I only do algorythems [now].

More Disclaimers:  1)  I know my spelling sucks, but I don't have
	enough time left today to correct it.
		   2)  If you didn't want to know you should not have
	asked.
		   3)  The Idea is free, the attitude is gon'a cost
	you.
		   4)  My constants are mine, and may not apply
outside my own little world.