book cover

Web References for
Programming Pearls

Peter Gordon, my editor at Addison-Wesley, once boasted that he had never published a book that contained a correct web address. He's a modest guy, but I did take his wise advice and kept web references in the book to a minimum. Here are some interesting and useful sites.

For The Entire Book

My book frequently cites Steve McConnell's Code Complete and Brian Kernighan and Rob Pike's The Practice of Programming.

A theme throughout this edition of the book is code tuning in the presence of caches. The index entry for ``cache-sensitive code'' points to ten such examples. Anthony LaMarca has been studying this general topic for a number of years, starting with his University of Washington Ph.D. Thesis on Caches and Algorithms. Jeff Vitter has been studying external memory algorithms, which employ similar techniques. Since I finished the book, I have been investigating ``Cache-Conscious Algorithms and Data Structures"; a talk I have given on that topic is available as a Powerpoint Show.

Column 1

For more on the main theme of this column, see the Computerworld article on Elegance in software.

Problem 1.12 recounts the urban legend of how NASA spent a million dollars to develop a special pen to write in space. For the truth, check out the amazing story of The Fisher Space Pen. The company was founded in 1948, and the first Fisher pen went into space on board the Apollo 7 in October, 1968 (according to the web page, the earlier Mercury and Gemini astronauts in fact took their notes with pencils).

Column 2

Solution 2.1 mentions that David Musser and Atul Saini's STL Tutorial and Reference Guide implements several anagram programs in Chapter 12 through 15. Their book's web site contains the source code for those programs.

Column 3

Solution 3.4 refers to the book Calendrical Calculations by Nachum Dershowitz and Ed Reingold. Their fascinating web site contains a great deal of material on the topic.

Column 4

Jeff Lagarias's papers on the 3x+1 problem and related problems (see especially the 1985 postscript annotated bibliography) will help you appreciate Solution 4.5.

Column 5

Kernighan and Pike's The Practice of Programming site has an excerpt from their Chapter 5 on Debugging. Steve McConnell's Code Complete site has an excerpt from his Section 26.2 on debugging. And when you do find a bug, Tom Van Vleck has Three Questions About Each Bug You Find.

Column 6

Butler Lampson's 1983 paper Hints for Computer System Design shows how to attack performance problems at a variety of design levels, which is the theme of Column 6.

Section 6.1 describes how Andrew Appel attacked the physics problem of many-body simulations at several design levels. Guy Blelloch devotes a section of his algorithms course to N-body simulations. Amara Graps has a very thorough Web page describing N-Body / Particle Simulation Methods. Both sites have many pointers to other relevant sites.

Column 7

Column 7 is about The Back of the Envelope. Here are some relevant sites:
    
A View From the Back of the Envelope.
     Old Dominion University Fermi Problems Site.
     Fermi Problems (at the University of Texas).

For more on the Rule of 72, try a web search like this.

Problem 7.12 is about the average life span of circulating coins; see what the U.S. Mint has to say on the topic. (While you're there, be sure to admire the 1999 New Jersey State Quarter.)

Todd Proebsting uses a simple experiment and the Rule of 72 to defend Proebsting's Law: ``Advances in compiler optimizations double computing power every 18 years.''

Mike Lesk uses a variety of estimation techniques as he calculates How Much Information Is There In the World? Before you peek at his answers, try the problem yourself: How much information in the world? How much storage? What does that mean?

Column 8

Column 8 describes a sequence of algorithms for one small problem. If you enjoy that approach, then you may enjoy my article ``Faster And Faster And Faster Yet'' in the June 1997 issue of UNIX Review. The article starts with a simple algorithm for solving an n-city Traveling Salesman Problem (TSP) that takes exponential time, and tunes it like crazy to get speedups of, literally, ``billions and billions''.

The Further Reading in Column 8 refers to two texts on algorithms and data structures. Those two and seven others were collected by Dr. Dobb's Journal as Dr. Dobb's Essential Books on Algorithms and Data Structures Release 2 CD-ROM. The complete set of nine electronic books can be ordered online for about the price of one physical book.

Column 9

Steve McConnell's Code Complete site has an excerpt from his Section 28.2 on code tuning. I described some system-dependent code tuning techniques in ``Working With The System'' in the October 1997 issue of UNIX Review; the run time of one function is decreased by a factor of 50.

Appendix 4 of Programming Pearls summarizes a set of rules from my 1982 book Writing Efficient Programs (now out of print). Those rules are summarized and ``adapted to suit the special needs and opportunities of the Origin 2000 architecture, SGI version 7.x compilers, and the MIPS instruction set'' by the Survey of High Performance Computing Systems. If you enjoy the topic, you should also investigate ORIGIN 2000 Perfomance Tuning and Optimization.

Column 10

The column talks about some simple approaches to squeezing space. For a high-tech compression scheme, let Craig Nevill-Manning and Ian Witten demonstrate the sequitur system for inferring hierarchies from sequences.

And speaking of coding theory, Tim Bell, Ian Witten and Mike Fellows have built the Computer Science Unplugged project of off-line demonstrations of computing ideas for elementary school children. Their activity about Error Detection and Correction will entertain programmers of all ages.

Column 11

Anthony LaMarca and Richard Ladner describe ``The Influence of Caches on the Performance of Sorting''. In the October 1997 issue of UNIX Review, I tuned an insertion sort in an article on ``Working With The System''. Combining code tuning and compiler optimizations led to a speedup factor of between 15 and 50 across three 200MHz machines.

Column 13

Andrew Binstock published ``Hashing Rehashed'' in the April 1996 issue of Dr. Dobb's Journal. This article describes how cache memories can have a profound effect on hash functions.

Most of Column 13 concentrates on one small searching problem, and Section 13.8 describes a medium-sized problem. Searching problems also come in the Jumbo size. How does AltaVista search millions of web sites in a fraction of a second? Richard L. Sites gives you a tour behind the scenes.

Column 14

Anthony LaMarca and Richard Ladner describe ``The Influence of Caches on the Performance of Heaps''.

Column 15

For one of the largest string problems that I know, look at a 1996 presentation by Richard L. Sites on how AltaVista searches thirteen billion words in half a second.

I first described the algorithm in Section 15.2 for finding ``Long Common Strings'' in the December 1997 issue of UNIX Review. (The title on that web page is wrong, but the content is right.) The article contains several exercises and extensions that didn't make it into the book.

Section 3 of Claude Shannon's 1948 paper on A Mathematical Theory of Communication is the earliest reference I know to generating the Markov text described in Section 15.3. That paper has been described as ``The Magna Carta of the Information Age''.

Kernighan and Pike describe an algorithm for Markov-text generation in Chapter 3 of their Practice of Programming, and implement it in several languages. Their site has source code for the problem in C, Java, C++, Awk and Perl.

In experiments starting in 1954, Fred Brooks, A. L. Hopkins, Peter Neumann, and Bill Wright used Markov chains to generate random music. Their work is described in ``An Experiment in Musical Composition'' in IRE Transactions on Electronic Computers EC-6, pp. 175-182, September 1957. The paper was reprinted on pages 23-42 of Machine Models of Music edited by S.M. Schwanauer and D.A. Levitt and published by MIT Press in 1993. Peter Neumann describes those experiments in (the third paragraph from the bottom of) his home page. They trained on a set of 37 common-meter hymn tunes, and then generated random tunes based on 0-th to 7-th order Markov chains (where the fundamental unit was the eighth note). Neumann observes that the 0-th order tunes sounded random indeed, while the 7-th order tunes were very similar to (but different than) the training set.

Copyright © 1999 Lucent Technologies. All rights reserved. Thu 6 Jan 2000