6.1 General Background
Recently, a great amount of effort has been spent integrating audio and video into computer applications. However, these multimedia applications present unique computational challenges. One of the most important obstacles that must be overcome in the development of useful multimedia applications is that realistic video and audio have storage and bandwidth requirements at the limits of current computers and networks. In order to lessen the effect of these bottlenecks, extensive research has been done to find the best compression techniques for these forms of data.
The Motion Pictures Expert Group (MPEG) has developed some of the most popular multimedia compression algorithms. These international standards are used in a variety of applications, including digital television (MPEG-2), the popular MPEG-1 audio layer three (commonly called MP3), and the most widely used video compression scheme found on the Internet, MPEG-1 video. MPEG-1 was designed to provide VCR quality playback from early CD-ROM drives, but it has since found many different uses.
MPEG encoding is a lossy process. In an effort to reduce the size of the final bitstream, the encoder searches the input data for patterns. Based on the behavior of the human sensory system, the information that would be least missed is omitted. This is a very effective way of compressing the data, but the process is computationally intensive. Because of this, real time encoders that produce reasonable quality output must be built from dedicated hardware and cost anywhere from several thousand dollars to tens of thousands of dollars. By contrast, decoding can be carried out on a modest home computer. In most cases, a video will be encoded once and decoded many times, so the cost of encoding is not a major obstacle. However, in some important applications like video conferencing, the high prices are prohibitive.
How MPEG-1 Works
Digital video is represented as a series of frames, and each frame is represented as a matrix of pixels. In order for a person to perceive the video as continuous, 20 frames per second must be displayed. Films are displayed at 24 frames per second, television at 30. MPEG-1, the most popular video format used with computers, is the form of compression implemented in this project. It commonly uses a frame size of 352 by 240 pixels, although this is not explicitly required by the standard. This roughly corresponds to television quality resolution.
There are three kind of MPEG video frames: I, P, and B. I (Intra) frames are encoded using information entirely within the frame. The process closely resembles the popular JPEG still image compression method. P (forward Predicted) frames are encoded using information both from within the frame and from the previous I or P frame. Compression is about 10 times better than for I frames, but the performance costs are high. B (Bi-directional predicted) frames are encoded using information from within the frame, from the previous I or P frame, and from the following I or P frame. B frames generally are compressed to about one sixtieth the size of I frames, but performance suffers even further. B frames are never used as a basis for the encoding of other frames. The time it takes to decode each of the three frame types is roughly the same.
The MPEG standard defines no set sequence of I, P, and B frames that must be used to encode a video. However, several patterns have become de facto standards. A stream encoded strictly with I frames can be encoded in real time on the last several generations of computers, but the compression is not high enough to be useful for most purposes. Streams encoded with many B frames provide excellent compression, but compression this high is well out of the reach of consumer hardware. A very popular compromise is:
In this sequence, each set of two B frames depends on the preceding I frame and following P frame. As depicted in Figure 6.1, the fourth frame must be encoded before the second and third frames. This causes a small minimum delay time between when frames are received and when they are displayed. All of the P frames depend on the initial I frame. Each triplet can be encoded independently of following triplets. The sequence in its entirety can be encoded independently of the sets that precede it and follow it. At real time, this sequence would be repeated 1.5 to 2 times per second. One final note: the choice of sequence usually has a negligible effect on the performance of the decoder.
6.2 Technical Problem
Current consumer hardware, such as the latest Intel Pentium processors, cannot perform MPEG-1 encoding at useful resolutions in real time. Because hardware encoders are so expensive, there is currently no cost-effective way to achieve real time performance. The idea of this project is to adapt a software encoder to run on several machines in parallel. By dividing the workload among several computers, higher performance can be gained at a lower cost.
6.3 Operating Environment
The parallel software encoder does not require any specific environmental conditions for operations. It can be used at home, at the office, in the production studio, or just about anywhere one could find three networked personal computers. It is as permanent or as portable as the hardware on which it operates.
6.4 Intended User and Uses
One group that could make use of this product would be web content providers. MPEG-1 video is the most widely used video format on the Internet, and not all of the people who create web content have access to current encoders. Hobbyists, who currently have no options for producing MPEG-1 video in real-time, could assemble the hardware needed to run this software on a limited budget. Finally, artists interested in distributing original films or animations could use this product for that purpose.
The MPEG-4 standard has recently been marketed as a solution of streaming motion video with such applications as broadcasting live motion content via the web or teleconferencing. As is to be expected, the cost of this new technology puts it out reach for many consumers. This project's parallel software encoder based on the MPEG-1 format provides a simplified alternative.
For the purposes of this project, however, only video compression was taken into consideration. Although it is likely that the parallel encoder could be manipulated to handle audio as well, to produce video output nearer to real-time it was excluded.
6.5 Assumptions and Limitations
One of the biggest advantages of a software-based encoder is configurability. However, in order to achieve the highest performance possible, this implementation must be closely tied to the underlying hardware. This implies that the customization that the application provides is somewhat limited. Despite these limitations, the software is likely sufficient for the vast majority of applications.
The software makes two important assumptions. The first is that the user is quite familiar with MPEG concepts, the second is that the user is experienced as a Unix user and is capable of performing basic administrative tasks such as editing text files and running applications from a command line.
Furthermore, it was assumed that adequate hardware could be found and
used for the development and testing of the project, and indicated accordingly
on the financial budget. The required hardware is considered widely available
and not deemed a significant challenge to find.