Path: utzoo!mnetor!uunet!mcvax!ukc!dcl-cs!nott-cs!jpo From: jpo@cs.nott.ac.uk (Julian Onions) Newsgroups: comp.unix.wizards Subject: A new encoding scheme Message-ID: <948@robin.cs.nott.ac.uk> Date: 1 Apr 88 08:55:11 GMT Reply-To: jpo@cs.nott.ac.uk (Julian Onions) Followup-To: /dev/null Organization: Computer Science, Nottingham Univ., UK. Lines: 212 Keywords: binary, unary, compression The below is the text of what we consider a revolutionary new concept in computer hardware and software - the authors would appreciate any comments on this proposal before attempting to `clean up' and buy out IBM, DEC and Sun. Please make sure you understand all the concepts before making any value judgements though. Thanks, Julian. --------------------------------------------------------------------- A New Data Encoding Scheme Andrew B. Cheese (abc@cs.nott.ac.uk) Julian P. Onions (jpo@cs.nott.ac.uk) Computer Science Department Nottingham University Nottingham. ABSTRACT -------- The computer industry from its very inception has used the binary system to represent all types of information. This system has much to recommend it's use. It is simple to construct hardware based upon this system and all the normal arithmetic and logical operations can be performed upon it. However, it is the authors opinions that this system is not as optimal as it could be and that some rather simple changes to the method of encod- ing data can bring about large increases in storage, communications and reliability. 1. The Idea - --- ---- The basic ideas stem from the seemingly simple idea of replacing the binary method of encoding by a different scheme. In essence, all that is required is to take the value normally represented by a 1 or true state, and replace this with two 0's or false states. e.g. decimal binary new scheme 9 1 0 0 1 00 0 0 00 13 1 1 0 1 00 00 0 00 At first sight there doesn't seem to be much of an advantage in this scheme of encoding but as the rest of this paper will attempt to prove, the benefits are enormous when applied in the proper way. This method, whilst not binary is also not strictly unary. It has therefore been christened sesquinary (from sesqui - one and a half). April 1, 1988 - 2 - 2. Applications - ------------ 2.1. File Storage. - - ---- ------- An intelligent operating system can make great use of the encoding. As an example, a file need not be stored as the complete set of bits. All that is required is for the operating system to keep count of the ``number'' of zero's in the file. In the case of the this would mean that the entire disc would consist of inodes. Each inode, instead of referencing blocks would keep a count of the number of zeros. For large files, double and triple indirection could be applied - see the section below on compression. Obvi- ously, for small files, the single indirection is more cost-effective but with larger files it would pay to move towards more indirection as a saving of space. A flag in the inode could keep count of the number of indirections currently performed. This scheme does have some overhead in the updating of random access files, in that the operating system must first ``unpack'' the file, perform the update, and then repack the file. This could probably be done in virtual memory for most operations though. 2.2. Networking. - - ---------- In networking, this method really comes into it's own. To begin with, there are practically no bandwidth limita- tions. The problems inherent in normal communication over serial and phone lines stems from the ability to detect the transitions between two states. Once this transition is removed, and the data is in effect transitionless, the bandwidth of the circuit is only reliant on the speed with which zeros can be pumped down the line by the hardware (and the rate at which they can be received of course). Another advantage comes in the standard ethernet environment. Normally an ethernet transceiver must wait for a clear slot to arrive, transmit the packet and detect if a collision occurred, if so it must retry. With the all zero encoding method transmission can take place at any point, there is in effect nothing on the ethernet that can scramble the signal as all hosts are transmitting zeros. 2.3. Compression - - ----------- As hinted at above the possibilities for compression are fantastic. You can forget Huffman encoding and Lempel- Ziv can take a walk! The compression techniques can reduce any amount of data to 1 number, although that number may be larger than the convenient word size of a given architec- ture. The basic algorithm is outlined below. April 1, 1988 - 3 - while (length(data) > 1) { data = count zeros(data); - iteration ++; } return iteration; This can be also be changed to do essentially the above but in N steps for large files. 2.4. Hardware - - -------- It is expected that there may be some implementation problems associated with the hardware of this device. How- ever, the benefits appear to outweigh the drawbacks in many ways. To begin with, the memory using this technique should be simple. There is no need to invert bits or to even sense the bits - they should all be zero anyway. Memory failure can be detected very easily, no need for complex CRC checks - any 1 bits are obviously due to failing memory. Another advantage is that all memory is effectively permanent, as there is no state to be saved. This means com- puters built using this model should be unaffected by power-outs and be impervious to crashes. 2.5. Encryption - - ---------- This scheme also seems to lend itself to data encryp- tion. The details have not been fully worked out and may appear in a second paper once the decryption algorithms have been straightened out. 2.6. Parallelism and Data-flow - - ----------- --- ---- ---- Again, this method has more advantages for parallel hardware. Shared memory is particularly easy to implement for the same reasons that the ethernet is easy - effectively there are no changes in memory state so collisions can't happen (unless defective memory is present). 3. Implementation - -------------- There have been some doubts raised about the hardware realisation of this technique, but in general this can prob- ably be attributed either to the resistance to change gen- erally found, or by manufacturers protecting their own interests. The vast benefits that this method seems to have though should mean that once it is taken up it will clean up in the computer industry. April 1, 1988 -- Julian Onions