We are surrounded by strings. Strings of bits make integers and floating-point numbers. Strings of digits make telephone numbers, and strings of characters make words. Long strings of characters make web pages, and longer strings yet make books. Extremely long strings represented by the letters A, C, G and T are in geneticists' databases and deep inside the cells of many readers of this book.
Programs perform a dazzling variety of operations on such strings. They sort them, count them, search them, and analyze them to discern patterns. This column introduces those topics by examining a few classic problems on strings.
These are the remaining sections in the column.
15.1 Words
15.2 Phrases
15.3 Generating Text
15.4 Principles
15.5 Problems
15.6 Further Reading
The teaching material contains overhead transparencies based on Sections 15.2 and 15.3; the slides are available in both Postscript and Acrobat.
The code for Column 15 contains implementations of the algorithms.
The Solutions to Column 15 give answers for some of the Problems.
This column concentrates on programming techniques, and uses those techniques to build several programs for processing large text files. A few examples of the programs' output in the text illustrate the structure of English documents. This web site contains some additional fun examples, which may give further insight into the structure of the English language. Section 15.1 counts the words in one document; here are more examples of word frequency counts. Section 15.2 searches for large portions of duplicated text; here are more examples of long repeated strings. Section 15.3 describes randomly generated Markov text; these pages contain addtional examples of Markov text, generated at the letter level and word level.
The web references describe several web sites devoted to related topics.
Copyright © 1999 Lucent Technologies. All rights reserved. Wed 18 Oct 2000