The programmer's question was simple: ``How do I sort a disk file?'' Before I tell you how I made my first mistake, let me give you a chance to do better than I did. What would you have said?
My mistake was to answer his question. I gave him a thumbnail sketch of how to implement a Merge Sort on disk. My suggestion that he dig into an algorithms text met with less than enthusiasm -- he was more concerned about solving the problem than furthering his education. I then told him about a disk sorting program in a popular programming book. The program consisted of about two hundred lines of code in a dozen functions; I estimated that implementing and testing the code would have taken the programmer at most a week.
I thought that I had solved his problem, but his hesitation led me back to the right track. The conversation then went something like this, with my questions in italics.
The context makes the problem clearer. In the United States, telephone numbers consist of a three-digit ``area code'' followed by seven additional digits. Telephone calls to numbers with the ``toll-free'' area code of 800 (the only such code at the time) were not charged. A real database of toll-free telephone numbers includes a great deal of information: the toll-free telephone number, the real number to which calls are routed (sometimes several numbers, with rules on which calls go where when), the name and address of the subscriber, and so on.
The programmer was building a small corner of a system for processing such a database, and the integers to be sorted were toll-free telephone numbers. The input file was a list of numbers (with all other information removed), and it was an error to include the same number twice. The desired output was a file of the numbers, sorted in increasing numeric order. The context also defines the performance requirements. During a long session with the system, the user requested a sorted file roughly once an hour and could do nothing until the sort was completed. The sort therefore couldn't take more than a few minutes, while ten seconds was a more desirable run time.
These are the remaining sections in the column.
1.2 Precise Problem Statement
1.3 Program Design
1.4 Implementation Sketch
1.5 Principles
1.6 Problems
1.7 Further Reading
The Solutions to Column 1 give answers for some of the Problems.
Several of the programs in the column and the solutions can be found with the other source code for this book.
The web references describe web sites related to the topic.
An excerpt from this column appears in the November 1999 issue of Dr. Dobb's Journal.
Copyright © 1999 Lucent Technologies. All rights reserved. Thu 22 Sep 1999