Corporate and Professional Publishing Group


Chapter One of

Mainstream Videoconferencing:
A Developer's Guide to Distance Multimedia

by Joe Duran and Charlie Sauer
ISBN 0-201-84747-7 * Paperback * 304 pages * 1997

CHAPTER 1. WHY WE'RE HERE (Introduction: Metaphors, Benefits, Road bumps, Systems, Applications, Futures)

1.1 Visions, Metaphors, Expectations, Realities

There are many visions of videoconferencing. The essence of most of them is that videoconferencing should transcend the geographical and physical boundaries between the participants through the use of shared audio, video and other media. Ideally, a conferencing system provides the illusion that all of the participants are in the same room, sharing one space. Current products support this illusion to the extent that they provide appropriate audio, video, and multimedia communication and controls for the user.

To start thinking about how technology can be applied toward achieving this goal, consider two widely used metaphors for better communications devices-the “picture phone” and “desktop videoconferencing.” The “picture phone,” a telephone with video capability, has been a popular (though sometimes maligned) concept since the AT&T demonstrations in the 1960s By themselves, a pair of picture phones is limited to two participants, and only partially transcends distance and physical boundaries. The need for a handset for sound, and the very small picture, relative to the people and surroundings, makes the participants very conscious that they are using a special device. (Use of the picture phone might be compared to use of a telescope-there is noticeable benefit, but little illusion.) In the desktop videoconferencing approach, a personal computer or workstation provides augmented audio and video. As with the picture phone, there is little attempt to mask the obvious boundaries between sites and to present an illusion of shared space. However, the augmented computer provides tremendous communication capabilities, and will be the most cost-effective approach for many users.

We don't mean to imply that picture phones and desktop videoconferencing systems are inherently limited to a pair of participants. Just as it is possible to have conference calls with multiple telephones, various approaches may be used to extend the number of participants in picture phone and desktop conferences. Depending on the application, such “multipoint” conferences may be fundamental to effective communication.

To transcend the physical boundaries, first imagine an environment without the boundaries and then attempt to extend the environment beyond the normal limits. For example, try to think of “stretching” a conference room across multiple sites. First, we want audio provided in such a way to allow hands-free group discussion. Typically, this means using multiple microphones and speakers with appropriate acoustic controls. Second, we need video cameras and large monitors to make the participants easily visible to each other. And, in many cases, we must provide for shared presentation materials, documents, marker boards, and so forth, so that most of the routine meeting facilities are shared across the multiple sites.

Similarly, imagine a classroom, or a medical practice, or a brokerage house, stretched across multiple sites. We want to provide seemingly single-site facilities across multiple sites, in such a way that physical boundaries are not barriers to the participants in the activities.

These metaphors, and the notion that an illusion of shared space can be achieved, set a high level of expectation of system performance. Also, we are used to the audio quality levels established by telephones and radio and the video quality levels provided by commercial television. If the conferencing equipment does not have comparable audio and video quality, the illusion will be diminished.

Achieving this performance requires real-time transfer of large amounts of audio, video, and data-orders of magnitude more than the quantities associated with telephones. The transfer must be directed and often must be secure, so broadcast technology associated with television is not appropriate. Thus the biggest obstacle to pervasive use of videoconferencing is the gap between the communication requirements and the limitations of the available communication infrastructure. Much of our discussion in this book is about narrowing and eliminating that gap.

Despite difficult gaps between communication requirements and capability, videoconferencing is practical and rapidly growing in popularity. Business meetings are effectively conducted by joining desks and conference rooms with videoconferencing equipment. “Distance learning” across multiple classrooms and campuses is now a routine practice. “Telemedicine” enables specialists and general practitioners to collaborate, and provides medical care in rural areas that would otherwise do without. Employers and job candidates meet without either having to travel for a face-to-face meeting. Arraignments and other legal proceedings are conducted by videoconference. Few of the participants in any of these situations have the illusion that they are located at the same sites, but many of them find they can ignore the fact that they are at multiple sites and proceed as if they were all together.

1.2 Benefits, Limits, Growth

Why use videoconferencing? It is easy for some of us to take the benefits of videoconferencing for granted, but we should not do so. Usually, the first benefit cited is economic. A typical business meeting of people from different locations, even a short one, can easily cost thousands of dollars for travel and lodging. Other costs of such a meeting, for instance, the time the participants spend traveling, may be much more important than the direct travel costs. Depending on the circumstances, a videoconference may be a much more effective alternative, saving direct costs, avoiding travel time, and possibly enabling discussions that might not otherwise be possible – when travel is not possible the choices are either a teleconference or no meeting at all. It is relatively straightforward to add a person to a videoconference while it is in progress if the person is available at any of the locations participating in the conference. In a meeting requiring travel, asking another person to join the meeting will usually not be practical. Videoconferencing enables individuals and organizations to manage time and opportunities that would otherwise be lost.

Let's take a closer look at direct costs. It is feasible to equip a conference room with a reasonable video system for roughly twenty thousand dollars. Depreciating that amount over three years, the monthly equipment cost per room can be kept to well under a thousand dollars. Costs of intracontinental long-distance communication for the video systems can easily be kept well under $100 per hour. Even if a room videoconferencing system is used only once a month, it is likely to cost less than direct travel costs for a meeting. These are fairly conservative figures; some users will see better cost benefits of videoconferencing. With more frequent use, the direct cost benefit clearly favors videoconferencing over travel. Similar arguments can be used to justify the costs of desktop videoconferencing systems. In this case, the cost benefit may be realized sooner, since desktop systems are much less expensive.

Of course, you could say it would be more cost-effective to use telephones. But in many circumstances, telephones are insufficient. Visual contact between people may or may not be the qualitative difference that makes an activity effective. Those who participate in multiway telephone conferences know that communication is seriously impaired without visual contact between people and shared access to documents, visual aids, diagnostic equipment, stock tickers, and so on.1 To help reduce these barriers, audiographics systems have been developed as a means to augment telephones with graphics such as shared documents. For some activities, audiographics may be sufficient. We believe that audiographics are a major aspect of videoconferencing, and that motion video is becoming sufficiently affordable such that most applications will include video. Much of the discussion in this book is not about motion video per se, but about the aspects of videoconferencing encompassed by audiographics.

Cost of travel versus cost of videoconferencing is often not the correct comparison. Videoconferencing is more than just travel replacement, it is an enabler of communication that otherwise would not take place. Physical meetings are necessary from time to time, but videoconferencing users can make more electronic “trips” in a day (or week) than they can physical ones, and with much less wear and tear. The telephone is still useful, but when it is insufficient, and a physical meeting is not possible, videoconferencing technologies allow meetings that would otherwise fail, or perhaps not even be attempted.

Some limits might not be overcome. Some individuals have a reluctance 2 to being “on camera” and resist the new technology, just as some avoid telephones and airplanes. The boundaries between sites of a conference are visible and inhibit some activities, e.g., side conversations during a meeting, and preclude others, e.g., physical contact.

As with other new technologies, estimating the extent and pace of usage growth is necessarily guesswork. Analogies to the computer industry have significant defects, but are still useful. Some have said that videoconferencing is of limited value and that few systems will be deployed. When IBM began making computers, there were serious questions of whether more than a few tens of computers would ever be sold and used! Rapidly increasing sales of videoconferencing contradict the minimal usage predictions. At the other extreme, some suggest that videoconferencing is the next “killer application” (in the sense that computer spreadsheets were the “killer application” that spawned the personal computer market), that it will drive demand for computers and communication lines. For the next few years, at least, there are sufficient obstacles to deny the “killer application” scenarios. But it is reasonable to expect growth sufficient to strain the delivery capacity of equipment suppliers and communication lines. In the personal computer industry, local-area networks (LANs) seemed ready for widespread usage every year from 1984 forward. Each new year was declared “The Year of the LAN.” LANs became pervasive by 1989. For several years now, analysts have forecast widespread availability of videoconferencing. Some year soon, the forecasts will have become reality, without a recognizable “Year of Videoconferencing.”

1.3 Technology Bottlenecks & Bumps in the Road

Let us now consider the technology issues that are being resolved in anticipation of the “Year of Videoconferencing.” As we said earlier, the biggest technical hurdle in videoconferencing is sending large amounts of data across existing networks that were designed for much smaller amounts. The existing networks are those intended for telephony, both local and long distance, and those intended for computer-communication, primarily internetworked LANs. Computer communication across wide areas typically uses the networks originally developed for telephony. The first thing we want to talk about is sending sound and video across local and long-distance telephone networks.

Sounds audible to humans have a frequency range up to roughly 20,000 cycles per second, or Hertz, abbreviated Hz. The sounds required for speech use a much smaller frequency range, up to roughly 3500 Hz. The telephone network is designed to transmit sounds in this smaller frequency range. Originally, this was done in terms of analog signals, where the strength of the signal on the telephone wires is directly analogous to the loudness of the sound, and the voltage of the signal alternates (between positive and negative) at the frequency of the sound. Most telephone service for residences and small organizations today uses the same analog conventions that were established early in the twentieth century.

Analog telephone service is not well suited to sophisticated connections of calls, either local and long distance. Also, when analog signals are sent long distance, the quality of the signals always degrades. For these and other reasons, long-distance telephone service and private branch exchanges (PBX) for connecting telephones within large organizations began converting to digital signals in the 1960's. Essentially all long distance service is digital now, as is most PBX service.

For home telephones and other phones connected directly to the telephone company switching facility, “the last mile,” the circuit from the home to the switching facility, is usually still analog. The telephone uses analog signals. These are converted to and from digital signals at the switching facility. When these circuits are used for fax and computer purposes, digital information must be converted to and from analog at both ends, because the switching facility is always performing the conversion. Fax machines and computers use modems (MOdulator/DEModulators) to perform the conversion. The maximum achievable data rate for modems using analog telephone circuits appears to be about 34,000 bits per second. (Higher rates can be achieved in one direction, under some circumstances, but conferencing usually requires equal rates in both directions.)

Digital representations of sounds use numbers, usually called samples to represent the loudness of the sound. The sinal-to-noise ratio is primarily determined by the range of numbers in a sample. Seven-bit samples are sufficient for speech, and 16-bit samples are sufficient for high-fidelity representation of music. Two samples per cycle of sound are sufficient to get good representation. Thus, for speech, roughly 8000 seven-bit samples per second are enough. For music, roughly 40,000 sixteen-bit samples per second are needed for high fidelity; Compact Discs use 44,000 sixteen-bit samples per second, and professional recording equipment uses 48,000 sixteen-bit samples per second.

Digital telephone systems are designed to handle connection and transmission of many channels of 56,000 (8000 x 7) or 64,000 (8000 x 8) bits per second, each channel representing the sounds of a telephone conversation.3 In the United States, 56,000 bit channels were used originally, but the trend worldwide is toward 64,000 bit channels. A 64,000 bit channel is referred to as a B-channel (B for “bearer”). A 56,000 bit channel is restricted. A typical telephone line in urban areas is capable of transmitting a pair of B-channels. If video is going to travel on the telephone network, it should fit within a few of these B-channels. As we now see in discussing the components of video signals, transmitting video within a reasonable number of B-channels is a significant challenge.

A picture on a television screen consists of many small dots, called pixels (“picture elements”). These are intended to be small enough that only the composite picture is seen, not the dots, but the pixels are readily visible if one looks closely at the screen. For North American broadcast systems,4 there is a maximum of roughly 360 to 400 pixels per row. There are roughly 480 visible rows broadcast, but most televisions show slightly more than half the rows. For videoconferencing (world-wide), a standard picture consists of 352-pixel horizontal resolution by 288-pixel vertical resolution. A single 352 x 288 picture is usually referred to as a “frame.”

A 352 x 288 frame equals 101,376 pixels. To directly represent a full-color pixel 24 bits are required (8 bits for each of the primary colors of light, red, green and blue). Thus one picture could take 101,376 x 24 = 2,433,024 bits. Motion requires 15 to 30 picture frames per second, so full-color, full-motion standard resolution video could require up to 73 million bits per second, well over a thousand B-channels! Fortunately, there are bridges across this apparent chasm.

By discarding less important information (for example, using far fewer than 24 bits of color per pixel) and coding the information (for example, only sending the differences between frames, not the entire pictures) it is possible to send a tiny fraction of those 73 million bits and still get good results. A pair of B-channels, across a telephone circuit, gives acceptable results for many uses. With today's coding technology, 6 B-channels (three telephone circuits) are enough to get very good results. Using more than 6 B-channels than six, say 12 or more, provides excellent results.

The most aggressive coding techniques have led to products intended for use with modems and analog telephone lines, using about 20,000 bits per second for the video. These products use lower pixel resolution, 176 x 144 or lower, and low frame rates. These products are becoming available in 1996 as a bundled aspect of personal computers sold for home use, so they are likely to be present in large quantities. It is unknown how well low resolution and frame rates will be accepted for home usage. For some families, having even low-quality video will be a wonderful benefit if it allows members to see each other, whereas for other situations the resolution and frame rate will be too limited to be considered valuable. It is not likely that low resolution and frame rates will be considered sufficient for “serious” applications.

Coding techniques can also be used to reduce the bandwidth required for audio, but not by such dramatic factors. Instead of using a full B-channel for audio, as implied earlier, speech quality audio can be fit into as little as one tenth of a B-channel.

The primary alternative to using the telephone network directly is to use the LANs and other networks designed for computer communication. The most overused phrase of 1994 was “The Information Superhighway,” but we will abuse the phrase a few more times to depict the bumps in that road to videoconferencing! The telephone network is based on circuit-switching, which means that once a telephone call (or videoconference) is established, circuits (B-channels) are dedicated to the call. Most computer networks are based on packet-switching, which means that packets (packages) of data travel on the same network, much like boxes on a conveyor belt or on trucks on the highway. As long as the packets flow smoothly, a computer network is a very good highway for audio and video data, providing the capacity equivalence of many B-channels. However, there are almost always traffic jams on the information highway. When the jams are minor, audio and video can get through in time and things work well. When the jams are major, conversation is halted. Improving computer networks to manage traffic jams and improving videoconferencing technology to mask the effects of those traffic jams are major goals of current development.

In 1996 a plethora of products and prototypes for telephony have been introduced for the Internet. As Internet telephony becomes practical and standardized, this will benefit Internet videoconferencing. We discuss this further in Chapters 2 and 13.

1.4 Representative System Characteristics

If (obviously, a “big if”) adequate telephone or computer networks are available, then videoconferencing is practical for many individuals and organizations. Videoconferencing is capable of saving travel time and money, and enables communication that otherwise would not take place.

Figure 1.1 - Roll-about Videoconferencing System

Figure 1.1 shows a representative roll-about videoconferencing system circa 1996. This is a good example to start with. From it we can consider the commonality and extensions to both desktop and larger room environments. A roll-about system is a medium-scale system intended for use by small groups in typical meetings. It is transportable from room to room, as long as the room has appropriate connections to telephone or local area networks.

The cabinet under the television monitor houses a personal computer and additional equipment to support conferencing. In most cases, the additional equipment is installed inside the personal computer. The camera on top of the monitor is motorized to enable convenient positioning (pan, tilt, zoom in/out) by the participants.

The core technology is based on an industry standard personal computer, with added support for audio and audio coding, motion video and coding, and communication protocols and interfaces. We reap two major benefits by beginning with the PC. First, by taking advantage of the mass production and low cost of the PC, many of the conferencing functions can be provided using relatively low-cost PC hardware and software. Second, the integration of the PC makes its normal capabilities directly available to the conference participants.

The television monitor is very appropriate to motion video, but has relatively few pixels per inch (roughly 30) compared to a computer monitor (roughly 75 pixels per inch). With a group system and this relatively low resolution (pixels per inch) monitor, it is usually appropriate to devote the full monitor screen to motion video from the remote site(s). To display shared presentation materials, shared marker boards, shared computer applications, etc., either the monitor must be switched away from motion video or we must resort to overlaying video with these alternate images. (This is analogous to showing both a forecaster and a weather map on a newscast.)

For an individual, a desktop system is both more manageable in physical size and more functional. The user is usually sitting much closer to the monitor than with a group system. Rather than dedicating the full screen to motion video from a remote site, the remote site video is shown in a window, of perhaps 352 x 288 pixels out of a total of 1024 x 768 pixels. The leftover screen pixels can be used for a local site (“preview”) video window, shared presentation materials, computer applications, and other images. Figure 1.2 depicts a personal computer display with a collection of such windows.

Figure 1.2 - Desktop Windows Example

For a larger scale room system, two or more monitors are used to allow for display of multiple sources of video, shared presentation materials, and computer applications. Figure 1.3 shows such a system.

Figure 1.3 - Large Scale Videoconferencing System

1.5 Applications

There are many current applications of videoconferencing, which will be discussed in coming chapters, especially in Chapter 6. The purpose of this section is to give a quick summary of some of the characteristics of current and near term use.

Business meetings. Larger companies with multiple locations have videoconferencing rooms at each of their locations. They may even have videoconferencing systems in most of their conference rooms. The systems used are typically midscale, larger than the roll-about shown in Figure 1.1 but not as complete as the system in Figure 1.3. The rooms and videoconferencing equipment are scheduled as part of the overall meeting scheduling. Use for intracompany meetings is probably much more prevalent than for meetings involving other companies. One of the explanations for the prevalence of internal meetings is familiarity of the participants with each other. People seem to be more comfortable using videoconferencing with people they already know than as part of a first meeting. Once companies become familiar with videoconferencing capabilities, they use it extensively. A large financial institution installed systems in two large cities 1,500 miles apart. The company anticipated use for certain emergencies, and dedicated communication circuits were already in place. The systems quickly came into use six hours per day for collaboration among groups that had been unable to get travel funds to visit each other. One large multinational company reportedly spent $500,000 on long-distance charges for videoconferencing in 1994. That figure leads to a guess that that company keeps dozens of videoconferencing facilities busy at least half the business day.

Distance Learning. Most state universities in the United States have multiple campuses, typically a primary campus and several additional campuses, and most of these have videoconferencing facilities for instructional purposes. It is often the case that a single instructor can conduct a class simultaneously across several of the campuses by videoconference. The precedent has been set for decades by broadcast lectures sent from a central site. There are obvious limitations with the broadcast approach, for example, interaction between the instructor and students, shared use of marker boards, etc. Current videoconferencing approaches can overcome these limitations and also provide capabilities analogous to traditional classroom facilities. For example, student response terminals can be used not only to give the effect of “raising hands,” but also to communicate specific responses. The same approaches apply outside of universities, of course, in corporate, government, and other learning and training environments.

Professional Conferences. Is a professional conference a business meeting or a learning event? In many cases it is both of these and more. The number of conferences is both daunting and tempting for many of us because there are many more interesting conferences than we can attend. (We can't spend full time going to conferences.) Fortunately, it is more and more common for major portions of technical conferences and similar meetings to be available, at least with audio and video, on the Internet. With Internet video availability, I can easily (and discreetly) attend the most interesting portions of a technical conference without leaving my office.

Telemedicine. Medical practice is tending toward higher degrees of specialization, along with increasing numbers of general practitioners. In rural or remote areas, there may be no physicians at all. Videoconferencing is being used to allow specialists and generalists to collaborate more effectively in diagnosis and treatment, not only by allowing physicians at different sites to view a single patient, but to share radiographic and other diagnostic information and instrumentation. Even in urban areas, videoconferencing systems are useful for patient monitoring, potentially allowing patients to remain at home. Again, it is not just motion video that would be communicated in a patient/physician visit, but diagnostic information (heart rate, blood pressure, temperature, etc.) from instruments that can be jointly managed during the visit.

Financial. There is an obvious surge toward electronic funds transfer, automated teller machines and home banking services via telephone and personal computer. On the other hand, some services (for example, opening/closing accounts, loan application, and so forth) still seem to require human intervention, possibly with more senior personnel. In many instances, these activities can be handled at a branch location via videoconferencing kiosks. In brokerage firms, traders often have a plethora of computing and information devices on their desks. One of these devices is likely to be a videoconferencing system, used for contact with other brokers, with clients, and with sources of information.

Product Support. Remote support of products by telephone is routine in many industries, covering everything from household appliances to manufacturing systems. In many cases, the customer problems are much more readily resolved if the support person can see the product and/or the customer can see visual examples from the support person. For sufficiently expensive products, manufacturers find it worthwhile to include videoconferencing equipment with the products to better enable remote support.

Legal. Some preliminary legal proceedings, such as arraignments and depositions, are handled by videoconference. The New York District Attorney's office uses desktop videoconferencing systems for arraignments. Rather than requiring the arresting officer to appear in person in court, the officer participates by videoconference, lessening travel, scheduling and other difficulties.

Employment Interviews. Travel for employment interviews is often a significant impediment for both applicant and employer, especially for initial interviews. By using videoconferences for initial interviews, the employer is able to better evaluate a pool of candidates, then have on-site interviews as warranted.

Sales Kiosks. Direct sales of many kinds of merchandise via telephone and television have been a significant trend in recent years. The customer and the salesperson often desire better communication than is achievable with a telephone call, and desire an interactivity not possible with the television shopping networks. Videoconferencing kiosks allow the centralization efficiency of telemarketing, with added communication benefits for both parties.

1.6 The Future

Communication Infrastructure. Better telephone service, better data communication, the entertainment potential of video on demand, and interactive home video are all demanding “faster, better, cheaper” communication infrastructure. The deployment trend toward pervasive, high-capacity communication (the “superhighway”) is likely not as rapid as predicted in the merger and partnership frenzy of 1993, but also seems inevitable. Since adequacy of communication infrastructure is the biggest issue in implementation of video conferencing, continuing improvements in the infrastructure are a major catalyst for videoconferencing.

System costs. Semiconductor and computing performance and cost-performance ratio, continue to improve without foreseeable barriers. These trends have a twofold benefit to the effectiveness and cost of videoconferencing equipment. First, the capabilities and costs of specialized hardware follow the general trend. Second, the capabilities of mass-produced computers more adequately match the needs of videoconferencing, reducing the need for specialized equipment.

New Applications. As availability and capability improve, many new applications will be found. Telecommuting will become a reality for a noticeable fraction of the workforce. Entertainment and social events will be augmented by, or even based on, videoconferencing.

New Approaches. The increase in communication and computing capability will stimulate improvements in providing the illusion of shared space. For example, with current technology, conferences among multiple sites typically allow only one or a few of the sites to be seen concurrently, often in an unrealistic fashion. Better technology will enable more sites to be visually active, and may allow “virtual reality” approaches to presenting the many sites as a shared space.


Footnotes

1. The first words of many multiway telephone conferences are “Does everyone have a copy of the faxed charts?”
Back

2. For most people, any such reluctance goes away with experience.
Back

3. In the computer industry, it is normal to use “K” (kilo) and “M” (mega) to represent 1024 and 1048576, respectively. In the communications industry, it is normal to use these to represent 1000 and 1,000,000, respectively. For example, a 56,000 bit per second line would often be referred to as “56K.” In general, we will limit our use of these abbreviations, so as to avoid confusion.
Back

4. There are many differences in numbers of visible pixels, based on the broadcast standard in use (three standards are in use by different countries) and the design of the television receiver.
Back