Xref: utzoo comp.lang.functional:131 comp.lang.prolog:2730
Path: utzoo!attcan!uunet!samsung!munnari.oz.au!bruce!goanna!ok
From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe)
Newsgroups: comp.lang.functional,comp.lang.prolog
Subject: Re: Pattern matching considered harmful
Message-ID: <3077@goanna.cs.rmit.oz.au>
Date: 26 May 90 09:00:22 GMT
References: <2584@skye.ed.ac.uk>
Followup-To: comp.lang.prolog
Organization: Comp Sci, RMIT, Melbourne, Australia
Lines: 112

In article <2584@skye.ed.ac.uk>, jeff@aiai.ed.ac.uk (Jeff Dalton) writes:
> More generally, pattern matching would be ok if, for a given structure
> type (or Prolog term or whatever), it were used only in a small set of
> basic operations.  However, programmers all too often use pattern
> matching throughout their programs and then, when they change the
> number or order of fields in a structure, have to go around and change
> all the patterns.  Moreover, to read a pattern one often has to count
> commas, which is error-prone to say the least.

I am very much in agreement with this.
As an example, suppose you have some data type which represents
a finite function (a dictionary).  For argument's sake, suppose you
start out with the idea that a dictionary is
	empty_dictionary
    or	non_empty_dictionary(Key,Val,Rest)
Suppose you find that part of your program wants to find
dictionary entries satisfying a test p(Key,Val).  You might be
tempted to write

dictionary_p_entry(non_empty_dictionary(Key,Val,_), Key, Val) :-
	p(Key, Val).
dictionary_p_entry(non_empty_dictionary(_,_,Rest), Key, Val) :-
	dictionary_p_entry(Rest, Key, Val).

If you later notice that most of the program is looking up known keys,
and decide to use AVL trees instead, you're stuck with a big change.
But if you had written a "dictionary.pl" file defining the operations
which know about the structure of a dictionary, you might have had a

	dictionary_member(Dictionary, Key, Val)

predicate (which you would have had _anyway_ to look things up!) and
you would just have written

dictionary_p_entry(Dictionary, Key, Val) :-
	dictionary_member(Dictionary, Key, Val),
	p(Key, Val).

Note that specifying the interface predicate dictionary_member/3
_logically_ pays off:  we can use the same predicate to look for known
keys or to enumerate keys, and have just _one_ predicate to remember.
(How that works inside the package is solely the business of the
"dictionary.pl" file.)

It really is a good idea to isolate knowledge of the constructors of
a data type in a single file.  It needn't make your program inefficient;
on the contrary, it is easier to get the design of a predicate like
dictionary_member/3 _right_ and the implementation fast when you have
to do it just once than to do it over and over again when you have
basically the same code scattered all over the place.

> This sort of thing happens all the time in Prolog and probably in
> functional languages as well.

Yes to both in general, no to both in the work of good programmers.

> Some Prolog programmers have told me
> that, of course, one could use a notation that involved named fields
> and then expand (unfold) this into patterns.  Few programmers actually
> do this, however.

Code to do it is in the book...  If you have a Prolog which supports
term_expansion/2 or a close analogue (C Prolog, Quintus Prolog, NU Prolog,
ZYX Prolog, SICStus Prolog, possibly others.  The mechanism in ALS Prolog
is unfortunately placed.  Prolog-2 hasn't got it at the moment, but if I
am reading the manual right the basic machinery is in there _somewhere_
so they could add it) it is not too hard to write a drastically simplified
analogue of Lisp's defstruct.  No new notation is needed.  One just
follows the existing convention for selecting from a data structure:
	predicate(Selector argument(s), Whole structure, Part selected)
For example, if you wanted a 'date' data type, you might use
	date(year,  Date, Year)		% Year = date$year(Date), &c
	date(month, Date, Month)
	date(day,   Date, Day)
In fact, if you are keeping the major operations on your data types in
their own files (where the implementation of the type is legitimately
visible) leaving field accesses as calls like this is unlikely to be a
major factor in the cost of your program.

> Given these problems, I wonder why pattern-matching seems to be
> regarded, rather uncritically, as a Good Thing.

Regard it critically and it is still a WONDERFUL thing.  The problems
are not specific to pattern matching.  *ANY* "leakage" of information
about the implementation of a type is going to cause *exactly* this
kind of problem.  To back this up, suppose we are implementing dates.
Suppose that I decide to represent dates as the number of microseconds
since the midnight before 1-Jan-1970, stored as a floating-point number.
(IEEE, VAX, and /370 double precision floats have enough bits to do
this for a useful range of years, and additions and subtractions will
be exact.)  Then a programmer might exploit his knowledge of this fact
by using
	Date1 < Date2		% Date1 occurs before Date2
If I then try to port the program to a Prolog with 28-bit floats (such
as C Prolog or SB Prolog) I'm going to have to change the representation
to some kind of compound term.  Boom!  < doesn't like it any more.  If I
represent dates as
	timestamp(Year,Month,Day,Hour,Minute,Second,Millisecond,MicroSecond)
I'll still be able to use @< to compare dates, but not < .

> Abstraction is a wonderful thing, when it's used.

Indeed and it is, but pattern matching isn't the only way of breaching it.
I would point out that Hope and ML and some other functional languages have
module systems in which pattern matching can be used quite safely; you can
export a type from a module without exporting the constructors.  It was
always my intention to extend the Mycroft/O'Keefe type checker to enforce
abstract data typing in just that way.  Maybe I'll get around to it...

-- 
"A 7th class of programs, correct in every way, is believed to exist by a
few computer scientists.  However, no example could be found to include here."