Path: utzoo!attcan!telly!lethe!torsqnt!nixtdc!dt
From: dt@nixtdc.uucp (David Tilbrook)
Newsgroups: comp.software-eng
Subject: style guides and testing tools comments and requests
Message-ID: <1990Dec15.071555.14971@nixtdc.uucp>
Date: 15 Dec 90 07:15:55 GMT
Organization: Siemens Nixdorf Information Systems
Lines: 189

Embarrassing as it may be for a Unix-old-timer to admit, I am not
sure that this will get out, so if someone who knows me reads
this, please acknowledge that they did so.

Due to the wide range of issues covered in this posting (am I
violating news etiquette?) the following a summary of the
discussions and requests that follow:

Brief comments on:

- Does testing belong in this newsgroup?

- Are programming style standards relevant?

Requests for information on experience with, opinions of,
availability of, or solutions for:

- Barton Miller's fuzz package.

- test data generators that reuse randomly generated data

- ways to convert rand() into desired distribution and express
  conversion

- references to schemes that embed documentation and testing specs
  in source files.

- using write-once data storage systems for a version system

- request for statistics on versioning system use

- is it a platform, machine, configuration, or environment, and
  how does one name it?

- strategies for coping with interface discrepancies

First w.r.t.  to discussions (D) and my comments (C):

D) Does testing belong in this newsgroup?

C) God I hope so ...  if a software-engineering grope [sic] isn't
   concerned with testing, who is?  Testing is an integral part of
   the software process, whatever model one uses.

D) Are programming style standards relevant?

C) Only if you can test them and/or they help in the testing.  I
   welcome more discussion on this topic, particularly w.r.t.
   what belongs in a style guide, and, most importantly, why.

Request: Does anybody have, or know how to get a copy of fuzz and
     ptyjig (the testing tools described in: ``An Empirical Study
     of Reliability of Operating System Utilities'', Barton P,.
     Miller et al, Usenix Software Management Workshop, 1989.) Any
     experience with this package would be appreciated.  Better
     yet would how I could acquire a copy.  (Bart - are you
     there?)

Request: I have a reverse grammer generator (tdg) that we use to
     generate test data.  However, it does not have a way of
     saving generated information for later incorporation.  If
     anyone has experience with a random test-data generator that
     can save and reuse previously randomly generated data, I
     would appreciate discussion of such experience and perhaps
     some evaluation of how to control it.  For example, I
     recently used tdg to generate random requests of a
     network-wide database server by generating shell commands
     that requested appends, replacements, and deletes.  This
     worked a treat, but had the problem that the range of
     database keys had to be limited to ensure that the deletes
     would occasionally try to delete an existent key.  Any ideas?
     Any P.D.  tools?  (tdg is available as part of the EUUG
     (whoops I mean EurOpen) Fall 90 distribution.)

Request: Does anyone have a nice way of turning rand() results
     into various string specified distributions for incorporation
     into a test data generator?  I would be particularly
     interested in a mechanism to express a normal or student-t
     distribution over a limited range ...  nothing too
     sophisticated ...  think of generating words, or sentences of
     normal length to test an editor, or a duration in minutes for
     a phone call (student-t).  The distribution needs to be
     expressed as a simple string so that it could be given as an
     argument to the generator.

Request: I am writing a paper on the way we embed of data-base
     entries into source code to contain the documentation,
     interface specifications, and regression testing information
     for the related module.  Any references to other such schemes
     (e.g., Mangle, Sob, and Wheeze) would be appreciated.  If
     there is interest, I will post a brief overview of my work
     and our experience in applying this work to our product, most
     of which is are large set (750 modules) of subroutines.

Request: I am currently doing a requirements analysis of a
     versioning system.  One aspect of this research is the
     problems inherent in using a write-once storage system.  For
     obvious reasons, using the change recording mechanisms used
     by either SCCS or RCS would not be satisfactory to anyone
     other than a WORM salesperson (would consume disks at an
     unaffordable rate).  So system should probably use some sort
     of mechanism that did not rewrite existing information ...
     however, the system must retrieve any version of the source
     in equal time (like SCCS but unlike RCS).  Any comments?  Any
     interest?

Request: As part of the aforementioned research, I need statistics
     on the use of versioning systems for long running systems.
     For example, I was recently informed that for one package
     running for some five years, the average number of versions
     per module was 80 and the average number of branches was 17
     -- that's right seventeen!  Within our group, there have been
     44,000 deltas made to some 6,000 modules over the last 18
     months, although many of the modules have died or been
     renamed, and thousands of the deltas are minor cosmetics
     (e.g., changing the name of our company from Nixdorf to
     Siemens Nixdorf ...).  My numbers are somewhat inflated as a
     initial file creation, file renaming each count for 2 deltas.
     I would really appreciate if people would send me such raw
     statistics from their own experience.

Request: Our product is built from a single source system on nine
     platforms simultaneously and has been or needs to be ported
     to many more.  There is a single parameterization file for
     each build tree and one setting within this file - namely the
     name/type of the platform - is used to map to the
     capabilities or select appropriate facilities throughout the
     source. The first request is for a clarification of
     terminology: What noun does one use to refer to the class of
     names that state: machine type; the operating system type and
     its release, version, and/or flavour?  Is this a
     ``platform'', a ``configuration'', a ``system'', an
     ``environment'', or a ``whats_it''?  Is there any sort of
     standard for the assignment of a name within this class?  Is
     there a need for a universal registry of such names?  My own
     scheme uses a simple concatenation of a cpu brand name, the
     vendor's specific OS version number and/or name, an underbar,
     and the closest flavour of a ``standard'' operating system
     (e.g., ``v?', ``4.?bsd'', ``unix5.?'').  The software tools
     that I use provide mechanisms to do shell like pattern
     matches against this name to do selection or suppression.
     Conforming to a widely accepted naming standard is obviously
     desirable.

Request: Inherent in the facilities/capabilities configuration
     when porting to a large number of platforms is identifying
     the subtle differences that arise between different releases
     of the same base system.  Fortunately, my experience is that
     the number of these subtle differences are usually small or
     detectable by the compilers.  Unfortunately most suppliers
     are very poor at identifying and/or documenting the
     differences that are not benign or detectable (e.g., the
     differing semantics of fopen(file, "a+")).  If readers have
     opinions on how these should be identified and resolved, I
     would be interested in hearing their views.  What is required
     is a strategy that can be applied to deal with such
     discrepancies, as they constantly arise and one wants a
     mechanism that can be applied quickly, while ensuring that
     any previous port is not going to be broken.  For example,
     the ``#ifdef <header_file_manifest>'' tactic is popular, but
     is frequently inadequate and can get excessively cumbersome
     without an #elif construct, or in situations where there are
     a large number (i.e., >= 3) of variations.  Furthermore there
     are many examples of situations where it just does not work.
     For example, my D-Tree used to use:

	  #include <envir/sys_stat.h> /* map to appropriate stat.h */
	  ...
	  #ifdef S_IFLNK
		  /* assume lstat(2), symlink() provided */
	  #else
		  /* assume lstat(2), symlink() NOT provided */
	  #endif

     This worked on some forty to fifty ports, until I encountered
     a system for which S_IFLNK was defined but lstat(2) was not
     provided.

Well, that's enough for now - apologies for the wide range of
topics raised, but I chose to do one posting rather than eight.

If you want to raise a discussion on any of the above issues,
please do so.  For those issues for which I receive data, I will
endeavour to post summaries.
-- 

-----------------------------------
David Tilbrook
Siemens Nixdorf Information Systems Ltd.