Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!decwrl!hplabs!hp-sdd!ucsdhub!jack!nusdhub!usenet From: usenet@nusdhub.UUCP (USENET News Admin) Newsgroups: news.admin Subject: Re: Map expiration dates Message-ID: <579@nusdhub.UUCP> Date: 14 Jan 88 07:02:44 GMT References: <2323@cxsea.UUCP> <7815@rutgers.rutgers.edu> Organization: National University, San Diego Lines: 280 Summary: Ugly, but interesting mapping possibility for map generation. In article <7815@rutgers.rutgers.edu>, pleasant@rutgers.rutgers.edu (Mel Pleasant) writes: > How can it be done such that we > never have to make a full one-shot posting of all of the files ever again? > Well, let's make it slightly simpler ... A change in the procedure *may* > require another full posting just to get it started. Ok, given that we do > one more full posting, now answer the same question e.g. no more full > postings beyond what's needed to start a new procedure.... Hi, I am a relative novice at net-mail who has basicly no experience with pathalias [etc.] So I am going to give an "how I would do it from scratch" off the top of my head. My spelling sucks, so 'where the mess ;-> THIS TEXT HAS NOT BEEN PROOFED, AND MAY BE HAZER-DOS TO YOUR MENIAL HELTH %^) [but the idea is serious, and, I think, quite good] These are my "given"s: 1) Backbone sites re-route almost everything. 2) Backbone sites _always_ re-route something in the form: backbone!machine!user where machine is non-adjcent. 3) The maps at backbone sites are generally up to date. 4) A user will "never" send any mail to a site of which that user has never heard. [no mail, no news, no phone conv. etc.] 5) If the conversation is by mail only, the user either knows the complete path, or at least one that works. 6) As news is expensive to pass around, each site admin tries to pass that news allong the cheapest thread possible. 7) No site is really concerned with how it got the mail, what each site is interested in is getting rid of the mail. 8) It should not be difficult to have a "dummy batch listing" created which could be shipped off to another program. In fact all that would really be necessary is for a copy of every "Path:" header to be piped to this mapper. Here is the mold for a news-based self-mapping scheme.: Each site will maintain two somple lists. These lists will be saved in a way similar to the way "history" info is saved under news. [i.e. the cheapest/most look-up-able way convenient] the first list will be a relativly short list of those sites which would like to be considered backbones for mail, followed by the "easiest way to get there" spelled out as basic text [i.e. site!site!site!site] This field will be built from the main map. The construction of the next list sounds very complex, though it should, in fact be rather easy, though a little messy. For the purposes of the following I will assume infinite memory availability as I have no idea what the actual memory requirements would be. These are the structures [in C] required: struct la_list { /* Linked list of array segments */ struct la_list *next; /* Link Hook */ struct site_d *uss[10]; /* Array of pointers to up stream sites */ unsigned long int lnnum[10]; /* "Line Numbers" */ int weght[10]; /* weighting index*/ } struct treent { /* Tree node entry */ struct site_d *less, *great; /* Lt and Gt branch pointers */ /* NOTE: Each step must be double indirected [through the site_d entry itself] */ } struct site_d { /* Site Discription */ struct treent name, lnum; /* Structures for 2 trees */ struct la_list *adjsys; /* pointer to head of linked list describing adjcent system */ char *sysname; /* Name of system */ int weight; /* Systems "frequency of apearence" portrayed as a "weight" */ struct la_list *bkrout; /* Backwards routing info */ /* for convience in jumping around */ /* [possibly optional ?????] */ } Each line of the file looks like: sysname weight [line# weight]... i.e. One System name, followed by that systems "weight", optionally followed by zero or more line-number/weight pairs. [This assumes ASCII text representation] followed by a newline character. Primary Loading: At the start of processing the old file is loaded into this "infinite memory" by the following sequence: The file is read sequentally, filling up one site_d structure per line. A number of la_list structures sufficient to hold all the line number/weight pairs in the respective arrays are malloc(ed) and linked into a list. These are filled with values. All the pointers are set to NULL and the first free array cross-section [i.e. the same index is always used on each aray so that element 2 of each of the three arays refers to one conceptual lump. etc.] has it's weight entry set to -1 [an ilegal weight] to signify the end of valid entries. When the entire file has been loaded, the tree structures are used to resolve the line number entries in the la_list as pointers to the approprate site_d structures. Durring this stage, the la_list pointed to by *bkroute is filled with the weight from the refrencing *adjsys and a pointer to the refrencing site_d structure. [This builds a "back track" of sorts, and may not be necessary.] [Logically each system entry now possesses a list of the systems adjcent to it. Because of the way this list is created, it is a set of weights as to the "next" system "from here". See the routing method below] Processing updates: In general this system is self updating. The "Path:" headders from all the news articles are pumped through the update system. The easiest way to generate this information is to setup rnews/inews to "rippoff" a copy of the "Path:" line and drop it in a file durring processing. As the system scans the "Path:" line It will always pre-fetch two strings beyond the one it is currently working on. Each line is assumed to start with a "dummy" entry representing the current system. For all intents and purposes the program will only deal with two cases 1) str1!str2!str3 2) str1!str2 In both cases the site_d->weight of "str1" is de-incremented while in case 1, "str2" is operated on relative to "str1", that is to say a) if necessary "str2" is issued a site_d, b) if necessary a la_list entry is created in "str1"'s la_list with a weight of 800, c) "str2"'s weight in "str1"'a la_list is de-incremented, d) if "str2"'s entry in "str1"'s la_list is less than 100 it is re-set to 100. In Case 2 the processing ends because "str2" is most likely a user, or something equally spesific and un-useful. Any time a new site_d is created it is murged into the linked list of _names_ and given a weight of 800. As a special ditinction to the above NO site with a weight greater than 1000 will EVER be de-incremented. period. After all the "Path:"s have been processed, execution continues with "Closing Processing" Special Case Processing: At times the wounderful-world-of-mapping people will want to inform the net of bogus and dis-continued sites. To do this they will issue a special message, the body of which will be formatted as a group of lines with the following format: Sitename [+|-]weight where the [+|-] is infrequent. After a standard loading of the data file, the normal path processing will comence. On completion of the normal path processing for each site "Sitename" the weight "weight" will be substituted in the site_d->weight entry. If there is a "+" or "-" the signed quantity will be applied as a modifier. When increments are used, no weight will be incremented above 1000 nor de-incremented below 100. Certan numbers have special meaning. See "Closing Processing" and "Ageing" Closing Processing: To end the updating of the map and write the Map out to the disk, you follow this procedure: 1) A Dummy entry representing the local system is generated by using the local-system-name, a site_d->weight of 0, the contents of the /usr/lib/news/sys file, and a local routing file. This entry supercedes any entry already in place for the local system. 2) The tree of site_d structures as refrenced by Sitename is walked least-to-greatest. Each site with a non-negitave number is given a new line-number [in order from 1 to whatever] 3) The tree is walked again, by Sitename, and all the la_list entries are sorted from least-to-greatest current weight. Step 4 may also take place durring this walk, but the sort MUST precede the write for greatest efficency. 4) The tree is walked again, by Sitename [as the line-number tree is long since ruined] and the Sitename and weight, followed by all the line-numbers [fetched by pointer, not from the local aray, as the local aray may no longer be valid] and weights are output to the new "Map" file. IMPORTANT:!!! Nodes possessing a negitive weight are NOT output. They have no line number, and they are simpley skipped. This effectively delets them. 5) The tree and lists are walked, and freed. 6) The update program exits. Ageing: Once a day, the map file is "aged" by adding one (1) to every site-weight [column 2] which is greater than 100 and less than 1000. Weight Interpretation: 0 Local Site. 1-99 Official Bakcbones. 100 Backbone-by-courtisy [un-official, busy sites] 101-999 Normal Sites. 1000 Seveerly aged [generally dis-used] sites. 1001-> Restricted Sites. [No mail/unreliable/etc] Figuring Routes: I have laid out a good mapping scheme, I do not presently possess the math or algorythm for reducing the possible choices to check, but the method of comparing one possible entry to another is as follows: 1) preform the standard starting load. For each path, or as you calculate you must 2) make a sum of: 2a) The weight associated with each site multiplyed by an urgency factor [3 to 7 with 7 applied to a most important message, as this will increase "detail" in otherwise identical results by increasing the importance of the connection information] 2b) The weight of the app link [in column 4, 6, 8, etc]. 3) The _LOWEST_ total is the best route. 4) Unknown destinations should be routed to the closest backbone. Rationale: This system is based, not so much on the "cost" of the link, but more on the "willingness" of each site to handle traffic. Generally, news travels fastest by the cheapest link, and later apearences of the same article at any site are ignored. Adjcent sites will tend to get _ALL_ the news from every site that even sort-of feeds them, which is good, considering that direct mail between adjcent sites should be encouraged. Because of this fastest-cheapest implication in the news system, each site will not have to be informed about the "cost" of the link explicitly. Because the mapping-gods can reset the weight of any site, system wide, without "updating a map section" the network traffic is reduced in all cases. Because the mapping-gods can delete a site by giving a single negitive number, again we loose the "new map section" Because the map is based solely on electronic topology, and is generally refrenced by line numbers, individuals can write relitively simple programs to manipulate the information, using shell scripts and "cut", "awk", and "sed" Though these may not be as effecient, sometimes they can be quite convient. [for instance, extract a list of all the leaf nodes becomes get all the entries with only 2 columns] Because there is a large range of ranking possible, but the expression of same is quite visual, the "master maps" [As in the current archives] can be compared to the system-generated in an easily automated way [i.e. awk $2=1000 {print $1} | check_for_invalid_master | send "where is your map entry" message] Without costing the system in general much in the way of accuracy and efficency. Because the system is self-mapping in general, people end up with only the parts of the maps they are likely to need. [i.e. if we only ger the comp newsgroups, we probably arn't going to need to post to a machine which only ever posted to rec newsgroups.] Because every group [or close enough to it] passes through a backbone, and because the "master maps" should still be kept, when a system needs to post to a machine which has never sent an article that reached that machine, that machine can route to a backbone, and make use of the master map [like systems can now]. Rob. nusdhub!usenet nusdhub!rwhite Disclaimer: This entire posting is off the top of my head. I have gotten into the guts of any other mail/mapping system to date, nor have I been on the net long [5 months maby] so this can't be stolen. Other Disclaimer: I am not enough of a programmer [now] to implement this [now]. Programming, for me, [now] still takes me much to long to do the simple things, for me to get anything back out in a reasonable time. I only do algorythems [now]. More Disclaimers: 1) I know my spelling sucks, but I don't have enough time left today to correct it. 2) If you didn't want to know you should not have asked. 3) The Idea is free, the attitude is gon'a cost you. 4) My constants are mine, and may not apply outside my own little world.