CHAPTER 2

The Audience Measurement Business

Ratings data are used in advertising, programming, financial analysis, and policy making. These activities have enormous economic and social consequences. But where do the data come from? Who decides what to measure and report? What determines the quality and availability of that information? One answer has to do with the research methods that are used to produce the data, a topic we discuss at length in the next chapter. But there are other considerations as well. Audience measurement is a business. Sometimes it is done as a nonprofit activity, but often the firms involved are intent on making money. Either way, they must be responsive to their clients, while operating within a budget. They are also subject to public scrutiny and regulatory oversight. Ultimately, economic and political considerations can affect the data as much as research methods. In this chapter, we trace the evolution of the audience measurement business. Doing so will help readers better understand current measurement practices and anticipate how they might change.

THE BEGINNING

Even the first “broadcaster” wanted to know who was listening. After more than 5 years of research and experimentation, an electrical engineer named Reginald A. Fessenden broadcast the sound of human voices on Christmas Eve in 1906. He played the violin, sang, recited poetry, and played a phonograph record. Fessenden promised anyone listening that he would be back on the air again for New Year’s Eve, and he asked that they write him a letter—an early attempt at “audience research.” Apparently, he got a number of responses from radio operators, many of them on ships at sea. They were astonished to hear more than Morse code on their headphones. Other early station operators asked for letters from listeners as well. Frank Conrad, who in 1920 launched the first U.S. radio station (KDKA), even played records requested by his listeners.

A need to know the audience soon became more than just a question of satisfying the operator’s curiosity. AT&T, the American telephone monopoly, hoped to develop radio as a business. By the early 1920s, it demonstrated that charging clients a toll to make announcements over its station could be an effective way to fund the medium. “Toll broadcasting,” as it was called, soon led to the practice of selling commercial time to advertisers.

By 1928, U.S. broadcasting was sufficiently advanced to provide listeners with consistent, good-quality reception. Many people had developed the habit of listening to radio, and broadcasters in cooperation with advertisers were developing program formats “suitable for sponsorship” (Spaulding, 1963). Although there was some public controversy over whether radio should be used for advertising, the Great Depression, which began in 1929, encouraged radio station owners to turn to advertisers for support. But for such a system to work, broadcasters had to be able to authenticate the size and composition of their audiences. Without that information, it was hard for broadcasters and advertisers to negotiate the value of commercial minutes.

Unfortunately, that kind of information was hard to come by. Unlike newspapers, which could document their circulation with audits, radio listening left very few traces. The first radio stations used primitive techniques to estimate the size of their audience. Some counted fan mail; others simply reported the population or number of receivers sold in their market. Each of these methods was unreliable and invited exaggeration. The networks were somewhat more deliberate about audience measurement. In 1927, NBC commissioned a study to determine not only the size of its audience but also the hours and days of listening. The company also sought information on the economic status of listeners, foreshadowing the use of “demographics” that is now so much a part of audience research. In 1930, CBS conducted an on-the-air mail survey, offering a free map to all listeners who would write to their local stations. CBS compared the response with the population of each county and developed its first coverage maps. But none of these efforts offered the kind of regular, independent measurement of the audience that radio would need to sustain itself.

Advertiser support was, more than any other factor, responsible for the emergence of the audience measurement practices that we have today. It is not surprising, therefore, that many of the methods for gathering ratings data were developed and institutionalized in countries that relied on commercial broadcasting. Initially, the United States, Australia, and Canada were more dependent on this method of funding than were the European countries. It was only as commercial broadcasting became more prevalent in Europe that more precise systems of audience measurement were put in place. In the United Kingdom, this happened in the 1950s. Other European countries followed a decade later. As Barrie Gunter, a well-known British audience researcher, noted:

Even in those countries which did not acquire commercial channels until fairly recently, a television audience measurement system nevertheless emerged modeled on those in countries with commercial channels. (2000, p. 122)

Many audience measurement systems were developed in the United States and later adapted for use around the world. So that is where we begin.

THE EVOLUTION OF AUDIENCE MEASUREMENT

The history of audience measurement is a story of individual researchers and entrepreneurs, of struggles for industry acceptance, as well as an account of the media industries themselves. It is also a story of research methods. Most major audience measurement companies rose to prominence by perfecting and promoting their own brand of research. And most major changes in the structure and services of the industry have also been tied to research methods. For this reason, we trace the evolution of audience measurement by organizing it around data-gathering techniques, all of which are still in use today.

Telephone Interviews

From 1930 to 1935, the revenues and profits of U.S. radio networks nearly doubled, all at a time when the country and most other businesses were in a deep economic depression. Because many American families did not have money to spend on other diversions—and because radio was indeed entertaining—the audience grew rapidly. An important stimulant to that growth was the emergence of a system for providing audience estimates that advertisers could believe. The first such system depended on another technological marvel—the telephone.

Then, as now, advertisers were the driving force behind ratings research. They helped create the first ratings company to conduct regular surveys. In 1927, a baking powder company hired the Crossley Business Research Company to survey the effectiveness of its radio advertising. Two years later, Crossley conducted a similar survey for Eastman Kodak, using telephone interviews to ask people if they had heard a specific program. At the time, the telephone was an unconventional tool for conducting survey research, but it seemed well suited for measuring something as far-flung and rapidly changing as the radio audience.

Archibald Crossley, the company president, was a well-known public opinion pollster. He suggested to the Association of National Advertisers (ANA) that a new industry association might use the telephone to measure radio listening. His report, entitled The Advertiser Looks at Radio, was widely distributed, and ANA members quickly agreed to pay a monthly fee to support regular and continuous surveys of radio listening. The American Association of Advertising Agencies (AAAA) also agreed on the need for regular radio audience measurements.

This new service, officially called the Cooperative Analysis of Broadcasting, or CAB, began in March 1930. CAB reports were generally referred to in the trade press simply as “the Crossley ratings.” Even the popular press began to note the rise or fall of specific programs or personalities in the ratings. Initially, only advertisers paid CAB for its service but, before long, advertising agencies began to subscribe. The networks had access to the reports as well, using them to sell time and to make programming decisions, but their use was “unofficial.” Not until 1937 were NBC and CBS allowed to become subscribers, thus sharing the cost and the data.

In the early years, Crossley revised his methods and expanded the amount of information he provided a number of times. By the 1935–1936 season, surveys were conducted in the 33 cities that had stations carrying CBS and the two NBC networks. Calls were placed four different times during the day and respondents were asked to “recall” the radio listening during the last 3 to 6 hours. Hence, Crossley’s method of measurement was known as telephone recall. Monthly and, later, biweekly reports were published that gave audience estimates for all national network programs. Further, three times a year, more-detailed summaries provided information about station audiences hour by hour, with breakdowns for geographic and financial categories.

But CAB’s methods had serious limitations. Telephone recall surveys could not reach radio listeners who did not have telephones. That limitation was less serious in the early years of the service, because the first families to purchase radios were from higher-income households that were likely to have telephones. By the end of the 1930s, when the growth of homes with radios began to outpace those with telephones, CAB had to alter its sampling procedures to include more low-income households to compensate.

The most serious limitation to the CAB method, however, was that it required listeners to remember what they had heard. Relying on memory was a source of error. As a result, a new method, called a telephone coincidental, gained favor among researchers. Coincidentals asked people what they were listening to at the time of the call. George Gallup, another soon-to-be famous pollster, was one of the first to conduct a nationwide telephone coincidental for the advertising agency Young and Rubicam.

Research comparing telephone recall and coincidental methods was done in the early 1930s and reported the following:

The results showed that some programs, which were listened to by many listeners, were reported the next day by only a few. In general, dramatic programs were better remembered than musical programs. However, the rank correlation between the percentage of listeners hearing 25 (half-hour) programs and the percentage reporting having heard them was about .78. This is a measure of the adequacy of the Crossley survey as compared with the simultaneous telephone survey. (Lumley, 1934, pp. 29–30)

The telephone coincidental provided a methodological advantage that opened the door for CAB’s first ratings competitor. This happened when Claude Hooper and Montgomery Clark quit the market research organization of Daniel Starch in 1934 to start Clark-Hooper. George Gallup assisted them in arranging for their first survey. Hooper later wrote, “Even the coincidental method which we have developed into radio’s basic source of audience size measurement was originally presented to us by Dr. George Gallup” (Chappell & Hooper, 1944, p. vii). In the fall of that year, Clark-Hooper launched a syndicated ratings service in 16 cities.

Ironically, Clark-Hooper was first supported by a group of magazine publishers who were unhappy with the fact that radio was claiming an ever-increasing share of advertiser dollars. They believed that Crossley’s recall technique overstated the audience for radio. Although it could be expected that coincidental ratings would capture certain unremembered listening, the publishers hoped that Clark-Hooper would show that many people were either not home or doing something else besides listening to the radio. In fact, the first Clark-Hooper results did show lower listening levels than those of CAB.

In 1938, Clark and Hooper split, with the former taking the company’s print research business. With great faith in the future of radio, Hooper went into business for himself. His research method was simple. Those answering the telephone were asked:

•  Were you listening to the radio just now?

•  To what program were you listening?

•  Over what station is that program coming?

•  What advertiser puts on that program?

Respondents were then asked to report the number of men, women, and children who were listening when the telephone rang.

Hooperatings, as his audience estimates came to be called, were lower than CAB’s for some programs but higher for others. As Hooper would argue later, people were better able to remember programs that were longer and more popular and had been on the air for a longer period of time. Respondents were also much more likely to recall variety programs; they were most likely to forget listening to the news (Chappell & Hooper, 1944, pp. 140–150). Over time, the industry began to regard C. E. Hooper’s coincidentals as more accurate than CAB’s recall techniques.

But methodological superiority was not enough. As the “creature” of the ANA and AAAA, CAB was well entrenched with the advertising industry. Recognizing that CAB served the buyers of radio time, Hooper decided to pursue the broadcast media, and he established a service to supply both the buyer and the seller. CAB might see fit to ignore networks and stations, but Hooper would seek them out as clients and provide them with the kinds of audience research they needed. This strategy was perceptive, for today the media account for the overwhelming majority of ratings service revenues.

Hooper also worked hard for the popular acceptance of Hooperatings. To achieve as much press coverage as possible, each month he released information about the highest rated evening programs. This went not only to the trade press but to popular columnists as well. In this way, C. E. Hooper, Inc. became the most visible and talked about supplier of audience information for the industry. Radio comedians even began to joke about their, or the competition’s, Hooperatings.

In addition to promoting popular consciousness of program ratings, Hooper was also responsible for establishing many of the traditions and practices of contemporary audience research. He instituted the “pocketpiece” format for ratings reports, which became the hallmark of Nielsen’s national U.S. ratings, as well as concepts like the “available audience” and “sets in use.” He also began to report audience shares, which he called “percent of listeners,” and the composition of the audience in terms of age and gender. Thus, by the end of the 1930s, the basic pattern of commercial audience research for broadcasting was set.

Hooper and his company were efficient and aggressive. He regularly conducted research to try to make his methods more accurate or to add new services, especially to help the networks and stations. He was also relentlessly critical of the CAB method that still depended on recall. As a part of this battle, in 1941, Hooper hired Columbia University psychology professor Matthew Chappell to study recall and memory. Two years later, they wrote a book trumpeting the advantage of telephone coincidentals.

Hooper’s aggressiveness paid off. Just after World War II, he bought out CAB, which was on the verge of collapse. For a brief time, C. E. Hooper was the unquestioned leader in U.S. ratings research. But even as Hooper reached his zenith, the broadcast industry was changing. The new medium of television was about to alter the way people used their leisure time. A new methodology and company were ascendant as well. Although he continued to offer local measurement of radio and television, in 1950, Hooper sold his national ratings service to A. C. Nielsen. As Hugh Beville, sometimes called the “dean of broadcast audience research,” noted:

Unfortunately, Hooper never saw that television was the big future of broadcasting. Had he retained network television when he sold the radio service to Nielsen in early 1950, he could have prospered. Instead, the day after the deal was announced Hooper held a press conference in which he said that “to make the deal attractive [we] threw in national television ratings.” Hooper had almost scornfully thrown away the ticket to the future of his company. (1988, p. 63)

Of course, with the wisdom of hindsight, we now know that television, not radio, was the “future of broadcasting.” But in 1950, that was not at all clear. Hooper expected very little of television. Nielsen was prescient enough to take a risk and position his company to become one of the dominant suppliers of television audience measurement (TAM) around the world.

Today, telephone interviews are a common data-gathering technique for marketing researchers and public opinion pollsters. Most ratings companies also use telephones in one capacity or another to identify respondents or secure their cooperation. Many still consider telephone coincidentals the “gold standard” for measuring broadcast audiences, although they are too limiting and expensive to be used on an ongoing basis. In fact, since the late 1990s, telephones have fallen out of favor as the principal means to measure day-to-day media usage.

Personal Interviews

Face-to-face, personal interviews were often used in early radio surveys. Beginning in spring 1928, market researcher Daniel Starch used personal interviews in studies commissioned by NBC. And even after the first ratings services had come into existence, CBS commissioned Starch to provide a series of reports in the 1930s. CBS argued that this provided more accurate information because Hooper’s “telephone calls obviously miss all non-telephone homes—which becomes an increasing distortion as one gets into the smaller communities.” Because CBS had fewer, often less-powerful affiliated stations than NBC, the network thought it could only benefit from this sort of audience research (CBS, 1937).

In the late 1930s, while Crossley and Hooper argued over different methods of telephone data collection and Nielsen worked to perfect his metering device, the personal interview was still the most accepted method of collecting sociopsychological behavioral information. One man in particular, Dr. Sydney Roslow, who had a doctorate in psychology, became intrigued with the technique while interviewing visitors at the New York World’s Fair in 1939. With the encouragement of Paul Lazarsfeld, a pioneer in early audience studies, he started to adapt these techniques to the measurement of radio listening.

In the fall of 1941, he began providing audience estimates, called “The Pulse of New York,” based on a personal interview roster recall method that he developed. When respondents were contacted, they were given a roster of programs to aid in recalling what they had listened to in the past few hours. Because Hooper, and later Nielsen, concentrated on network ratings, Roslow’s local service expanded rapidly—especially with the tremendous expansion of stations after World War II. By the early 1960s, Pulse was publishing reports in 250 radio markets around the country and was the dominant source for local radio measurement.

In Australia, which had advertiser-supported radio as early as the 1930s, personal interviews were also an important means of data collection. The method was particularly appealing because, at the time, so few Australian households had telephones. In the 1940s, two competing ratings companies, the McNair Survey and the Anderson Analysis of Broadcasting, both adopted aided-recall interviews, similar to Pulse, although within a couple of years, Anderson moved to a diary technique. These competing services produced somewhat different audience ratings and appealed to different constituents. In the absence of any formal mechanism to audit their procedures, they also served as a check on one another’s results. In 1973, the firms merged and settled on diaries as the method of data collection.

Still, personal interviews had some advantages over the alternatives, particularly telephones. They could include out-of-home listening (e.g., automobile and work) and measure radio use during hours not covered by the telephone coincidental—Hooper was limited to calls from 8 a.m. to 10:30 p.m. Further, they provided demographic details and information on many minority and foreign-language stations popular with those less likely to have telephones.

Because ratings based on personal interviews reported audiences that were hard to see with other methods, they helped reshape U.S. radio. Pulse’s emphasis on measuring audiences in the metro area, versus Nielsen’s nationwide measurement of network programs, contributed to the rise of “Top 40” and popular music format stations. These became popular with many local advertisers who were only interested in the number of listeners in their marketing area. Thus, Pulse’s method was a boon to the growth of rock formats, just as more and more local stations were coming on the air, and more and more network programs and personalities were transferred to television or oblivion.

As was the case in Australia, though, by the 1970s another method took control of local radio ratings. The American Research Bureau (ARB), which we describe in the sections that follow, used its success with television diary techniques to move into radio. As a subsidiary of a large computer company, ARB had superior computing power that aided in the timely production of market reports. It also appears that the rock and ethnic stations favored by the interview method were not as aggressive in selling to advertising agencies, so agencies came increasingly to accept the diary technique being promoted by news and “easy listening” stations. In 1978, Pulse went out of business.

Today, personal interviews are no longer a mainstay of the audience measurement business, although a few operations still use interviewers to gather information or personally place diaries with respondents. Bona fide personal interviews are expensive, and traditional questionnaires based on recall have a hard time tracking media use in a highly fragmented digital media environment. Nonetheless, they can be an important way to study audiences. For example, Mediamark Research (MRI) conducts a survey of 26,000 American consumers. In the first wave of data collection, personal interviewers visit people’s homes to gather a range of demographic and media usage data. By doing so, they are able to present respondents with various cards depicting media outlets to aid in the data collection process. While these data do not function like an audience ratings currency, they are wed to extensive information about product purchases, and so help guide the allocation of advertising expenditures.

Diaries

In the 1920s, many radio set builders and listeners were not interested in programs at all. Instead, they were trying to hear as many different and distant stations as possible. To keep track of those stations, they kept elaborate logs of the signals they heard and when they heard them. They noted information such as station call letters, city of origin, slogans, and program titles. Despite this early form of diary keeping, and the occasional use of diaries by radio ratings firms, the diary method did not become an important tool of commercial audience research until the rise of television.

The first systematic research on diaries was done by Garnet Garrison. In 1937, he began to “experiment developing a radio research technique for measurement of listening habits which would be inexpensive and yet fairly reliable” (Garrison, 1939, p. 204). Garrison, for many years a professor at the University of Michigan, noted that at the time the other methods were the telephone survey, either coincidental or unaided recall, personal interviews, mail analysis or surveys, and “the youngster automatic recording.” His method, which he called a “listening table,” borrowed something from each because it could be sent and retrieved by mail, included a program roster, and was thought to be objective. His form provided a grid from 6 A.M. to midnight divided into 15-minute segments and asked respondents to list station, programs, and the number of listeners. He concluded that:

With careful attention to correct sampling, distribution of listening tables, and tabulation of the raw data, the technique of “listening tables” should assist materially in obtaining at small cost quite detailed information about radio listening. (Garrison, 1939, p. 205)

CBS experimented with diaries in the 1940s but apparently thought of the data as applicable only to programming and not to sales. Diaries were used to track such things as audience composition, listening to lead-in or lead-out programs, and charting audience flow and turnover. In the late 1940s, Hooper also added diaries to his telephone sample in areas “which cannot be reached practically by telephone.” This mixture of diary and coincidental was never completely satisfactory. Indeed, one of the reasons for the slippage of Hooper against Nielsen was that the telephone method was, for the most part, confined to large metropolitan areas where television first began to erode the radio audience. Hence, Hooper tended to understate radio listenership.

It was not until the late 1940s that diaries were introduced as the principal method of a syndicated research service. As director of research for the NBC-owned station in Washington, DC, James Seiler had proposed using diaries to measure radio for several years. The station finally agreed to try a survey for its new television station. NBC helped pay for several tests, but Seiler set up his own company to begin a regular ratings service.

He called the company American Research Bureau (ARB), and in Washington, just after the war, its name sounded very official, even patriotic. ARB issued its first local market report in 1949. Based on a week-long diary, which covered May 11–18, it showed Ed Sullivan’s Toast of the Town Sunday variety program with an astonishing rating of 66.4. By fall, the company also was measuring local television in Baltimore, Philadelphia, and New York. Chicago and Cleveland were added the next year. The company grew slowly at first—as both television and the diary research methodology gained acceptance. In 1951, it merged with another research company, called Tele-Que, that had begun diary-based ratings on the West Coast, thus adding reports for Los Angeles, San Diego, and San Francisco.

Through the 1950s, ARB emerged as the prime contender to Nielsen’s local television audience measurement, especially after 1955, when it took over the local Hooper television ratings business. ARB expanded, and by 1961 it was measuring virtually every television market twice a year, and larger markets more often. The networks and stations responded by putting on especially attractive programming during these “sweeps” periods when diaries were in the field. In 1973, ARB changed its name to Arbitron. Its head-to-head competition with Nielsen would last for another two decades but ultimately fall victim to television industry economics. As television stations’ budgets tightened in the more competitive media environments of the late 1980s and early 1990s, stations could no longer afford to buy two ratings services. The balance tipped in Nielsen’s favor, and it became the de facto currency in local television. In November 1993, Arbitron ended its television measurement business.

Radio was a different story. For reasons we discuss in the following section, Nielsen ended its radio measurement operations in the early 1960s, at which point Arbitron began using diaries to provide local radio reports. These, as we have seen, eventually put Pulse out of business. Another company, Statistical Research Inc. (SRI), eventually filled the void left by Nielsen’s national radio service. SRI was formed in 1969 by Gerald Glasser, a statistics professor at New York University, and Gale Metzger, former director of research for the Nielsen Media division. Three years later, the company took over operation of a collaborative industry research effort called Radio’s All Dimension Audience Research (RADAR), which continued to produce reports on radio network audiences. Harkening back to the days of CAB, SRI used recall telephone techniques to collect their data.

For many years, Arbitron was the undisputed provider of local radio ratings. For a time, a company called Birch/Scarborough Research, which also used a telephone recall technique, posed a challenge. But, once again, the industry was unwilling to adequately fund a second, competitive service. In 1992, Tom Birch stopped producing ratings and sold the more “qualitative” Scarborough service to Arbitron. In 2001, Arbitron acquired the RADAR brand from SRI, establishing it as the dominant supplier of radio ratings in the United States, a position it enjoys to this day.

Despite their limitations, which we discuss in the following chapter, diaries remain an important tool for audience measurement around the world. In the United States, Australia, Russia, Asia, and many European countries, they are still used to measure radio listening, although in some larger markets, more expensive “portable” meters have supplanted them. Even in television, where meters are widely accepted as a more accurate means of measurement, diaries are still in use. For smaller markets, which cannot justify the expense of metering systems, diaries remain the best option for measuring audiences. In fact, at this writing, Nielsen uses diaries to produce television ratings in the majority of local U.S. markets. While that may change, diaries are likely to be in use for some time to come.

Meters

From the earliest days of commercial radio, broadcasters and advertisers recognized the potential advantages of making a simultaneous, permanent, and continuous record of what people actually listened to on the radio. Technical problems involved in developing such a system were solved in the 1930s, and they were in common use by the late 1940s. When these meters finally arrived, however, they had a profound and lasting impact on the ratings business.

While a student at Columbia University in 1929, Claude Robinson—later a partner with George Gallup in public opinion research—patented a device to “provide for scientifically measuring the broadcast listener response by making a comparative record of … receiving sets … tuned over a selected period of time” (Beville, 1988, p. 17). The patent was sold to RCA, the parent company of NBC, but nothing more is known of the device. Despite the advantages of a meter, none had been perfected, leading Lumley (1934) to report:

Although the possibilities of measurement using a mechanical or electrical recording device would be unlimited, little development has taken place as yet in this field. Reports have been circulated concerning devices to record the times at which the set is tuned in together with a station identification mark. None of these devices has been used more than experimentally. Stanton, however, has perfected an instrument which will record each time at which a radio set is turned on. (pp. 179–180)

The reference was to Frank N. Stanton, then Lumley’s student, who would later become the president of CBS. For his dissertation, Stanton built and tested 10 devices “designed to record set operation for [a] period as long as six weeks” (Lumley, 1934, p. 180). On wax-coated tape, one stylus marked 15-minute intervals while another marked when the set was turned on. The device did not record station tuning but was used to check against listening as recorded on questionnaires. Stanton, by the way, found that respondents tended to underestimate the time they spent with the set on.

In 1930 and 1931, Robert Elder of the Massachusetts Institute of Technology conducted radio advertising effectiveness studies that were published by CBS. In 1933–1934, he and Louis F. Woodruff, an electrical engineer, designed and tested a device to record radio tuning. The device scratched a record on paper by causing a stylus to move back and forth as the radio tuner was moved across the dial. Elder called his device an audimeter and sought a patent. Discovering the previous Robinson—now RCA—patent, he received permission from RCA to proceed. The first field test used about 100 of the recorders in the Boston area. In 1936, Arthur C. Nielsen heard a speech by Elder describing the device and apparently began negotiating to buy the rights to the technique immediately.

Trained as an electrical engineer, Nielsen had opened a business in 1923 to test the efficiency of industrial equipment. The business survived but did not prosper. Ten years later, a pharmaceutical client suggested to a Nielsen employee that what they really needed was information on the distribution and turnover of their products. In response, Nielsen developed a consumer survey based on a panel of stores to check inventory in stock. The business grew rapidly, a food index was added, and the company thrived. The A. C. Nielsen Company was on its way to becoming the largest marketing research firm in the world. But it was the acquisition of the Elder–Woodruff audimeter that would ultimately make Nielsen’s name synonymous with audience measurement.

With his engineering background, and the profits from his successful indices, Nielsen redesigned the device. There were field tests in 1938 in Chicago and North Carolina to compare urban and rural listening. By 1942, the company launched the Nielsen Radio Index (NRI), based on some 800 homes equipped with his device. Nielsen technicians had to visit each home periodically to change the paper tape in the device (Figure 2.1), which slowed data collection. However, the company also provided information about product purchases, based on an inventory of each household’s. “pantry” Having already established a good reputation with advertisers, Nielsen began to make progress in overtaking the dominant ratings supplier, C. E. Hooper.

Image

FIGURE 2.1. The First Audimeter (1936)

Source: Nielsen Company. Reprinted by permission.

During the 1950s, Nielsen continued to expand his ratings business and to perfect the technology of audience measurement. As we noted, in 1950 he acquired Hooper’s national ratings service. In the same year he initiated the Nielsen Television Index (NTI), the company’s first attempt to measure that fledgling medium. By the middle of the decade, he launched the Nielsen Station Index (NSI) to provide local ratings in both radio and television. His engineers perfected a new version of the audimeter that recorded tuner activity on a 16-mm film cartridge. More importantly, the cartridge could be mailed directly to Nielsen sample households and then mailed back to Nielsen headquarters, thereby speeding the rate at which data could be collected. Nielsen had also begun to use diaries for gathering audience demographics. To improve their accuracy, he introduced a special device called a “recordimeter,” which monitored hours of set usage, and flashed a light to remind people to fill in their diaries.

The 1960s were a tumultuous decade for America and audience measurement companies. In an atmosphere charged by quiz show scandals on television, reports of corruption and “payola” in the music industry, as well as growing social unrest, the U.S. Congress launched a far-reaching investigation of the ratings business. Recognizing the tremendous impact that ratings had on broadcasters and concerned about reports of shoddy research, Oren Harris, chairman of the House Committee on Interstate and Foreign Commerce, orchestrated a lengthy study of industry practices. In 1966, the Harris Committee issued its report. Although it stopped short of recommending legislation to regulate audience measurement, the investigation had a sobering effect on the ratings business—effects that are still evident today in the scrupulous detail with which methods and the reliability of ratings are reported and the existence of what is now called the Media Rating Council (until 1982, the Broadcast Rating Council; from 1982 to 1998, the Electronic Media Rating Council).

As the premier ratings company, Nielsen was particularly visible in the congressional hearings, especially its radio index. In response, Mr. Nielsen personally developed a new radio index that would be above criticism. Unfortunately, potential customers resisted the change because of the increased costs associated with data collection. Angered by this situation, in 1964 Nielsen withdrew from national radio measurement altogether. In fact, a year earlier, Nielsen had discontinued local radio measurement, leaving Pulse unchallenged.

As we noted in the previous section, for the better part of four decades, Nielsen and Arbitron were in direct competition to provide local television ratings. In large markets that could bear the expense, Nielsen used household meters supplemented with diaries. Elsewhere, both companies depended on diaries. What was then the ARB, however, made limited use of meters as well. They even tried to one-up Nielsen by developing a meter whose contents could be retrieved over a telephone. In 1957, ARB placed meters in 300 New York City households and began to provide “instantaneous” day-after ratings. Generally speaking, this move met with the approval of advertisers and the media because it meant Nielsen might face more effective competition. Unfortunately for ARB, Arthur Nielsen and his engineers had patented almost every conceivable way of metering a set. ARB’s new owner, a firm named CEIR, was forced to pay Nielsen a fee for the rights to the device. Nevertheless, this spurred Nielsen to quickly wire a New York sample with meters and, later, in 1973, to introduce a storage instantaneous audimeter (SIA) as the data-collection device for its full national sample. By doing so, Nielsen was able to retrieve data over telephone lines and produce household ratings that were referred to as “overnights.”

The most important shortcoming of a conventional household meter like Nielsen’s was that it provided ratings users with no information about who was watching. As advertisers became more sophisticated about targeting their messages to particular types of viewers, household information seemed less and less useful. Diaries were used to fill that void, but they were slow and error prone and had to be reconciled with the metered data. A better solution would be to collect “people” information in direct conjunction with metered measurement.

By the 1980s, commercial broadcasting was beginning to make headway in Europe. With it, came an increased interest in newer, more accurate audience measurement systems to replace the “hotchpotch of incompatible meter systems, and conventional diary and recall operations” (Gane, 1994, p. 22). In the early 1980s, Audits of Great Britain (AGB) installed “peoplemeters” in a sample of Italian households. These devices not only monitored television set activity; they also allowed people to indicate who was watching. By 1984, AGB was operating a panel of peoplemeters in the United Kingdom as well. At about the same time, Telecontrol, a Swiss company, installed peoplemeters in Switzerland and the former West Germany. By the end of the decade, most of Western Europe was using peoplemeters to measure television viewing. The United States, too, would follow suit.

Since the 1950s, Nielsen has been the sole supplier of national television network ratings in the United States. These data were produced by combining household meters with diaries. AGB hoped its peoplemeter might allow it to compete with Nielsen for the U.S. market. The company secured funding from the industry, including advertisers and the media, and within a couple of years it had sufficient support to install peoplemeters in Boston and begin a field test of the system. Nielsen, which was by then a major international marketing firm, had been monitoring the competition and developing its own peoplemeter. It announced plans to test and implement a national peoplemeter service and in 1987 began basing its national “NTI” services on a sample of households equipped with peoplemeters. AGB held on for a time, but with equivocal support from the industry, especially the broadcast networks, its position became untenable. In 1988, it ended its U.S. operations.

In the 1990s, with $40 million in funding primarily from broadcast networks, the industry considered yet another alternative to Nielsen as the provider of national television ratings (Terranova, 1998). It was called “Systems for Measuring and Reporting Television,” or SMART. The SMART project was run by SRI, the company that, at the time, provided national radio network ratings. It used a peoplemeter-like box to measure viewing but offered certain technical differences from Nielsen’s system, including the ability to detect a bit of electronic code that identified the source being viewed. It also had a more inclusive definition of who should be counted in the audience, which appealed to the networks. The SMART project began testing by wiring 500 households in Philadelphia. To go national, though, SRI needed another $60 million in funding from the industry. Advertisers, who saw SMART as the creature of the networks, declined to support the effort. At that point, the networks stopped funding the project and it, too, ended its operations.

Peoplemeters are today the preferred method of television audience measurement around the world. In addition to North America and Europe, they are in widespread use in Asia and Latin America. Sometimes they are operated by large international corporations that have a presence in many countries like Nielsen, which now partners with AGB, or the GfK Group, which owns Telecontrol. Sometimes firms operate in a single country, like CSM, which measures China in collaboration with Kantar Media, or Mediametrie, which is a free-standing French ratings company. But if our history of audience measurement teaches us anything, it is that nothing stays the same for very long.

Peoplemeters are, themselves, subject to continuing modifications and improvements. There are now portable peoplemeters (PPMs), which panel members carry with them throughout the day. The fact that many households now receive television via cable or satellite service has encouraged some to use data from digital set-top boxes (STBs) to monitor viewing. In effect, this turns those boxes into household meters. During the 1990s, as Internet use grew, companies like comScore and NetRatings, now owned by Nielsen, started recruiting panels that used people’s own computers to monitor and report their site visits. In effect, this turned their machines into a kind of peoplemeter. In fact, the Internet has augured even more radical changes in audience measurement. Because more and more media are served to people via computers, it is possible to track everyone who visits a site or downloads media. This has opened the door to a very different strategy for measuring audiences, which we will describe in the section that follows.

THE AUDIENCE MEASUREMENT BUSINESS TODAY

Understanding how ratings research has evolved over the years can give us a better understanding of today’s business arrangements, the kinds of data that are available, and what we might expect in the future. Our abbreviated history of worldwide audience research is not an end in itself but rather a way to learn lessons we might apply going forward. We have seen that audience measurement can be a competitive business. While the accuracy of research methods is important, so are the desires and willingness to pay of the clients. We have also seen that measurement systems can affect the operation of the media themselves. As a result, they can be the subject of industry negotiations and politics. All of this is still true today.

In this section, we will begin by noting the challenges posed by the new media environment, then briefly review the measurement strategies that are being used to address those challenges, and, finally, comment on a number of institutional factors that shape the nature of syndicated audience data.

Challenges of the New Media Environment

The media environment has always affected the business of audience measurement. As radio stations proliferated and people began listening in cars, ratings research was forced to adapt. In the twenty-first century, newer media systems and the demands of advertisers have presented audience measurement companies with serious challenges. Three changes have been particularly troublesome. First, the ever-increasing number of media outlets has fragmented audiences into smaller groups. Second, “nonlinear” systems, like video on demand (VOD), have given users greater control over when media are delivered. Third, new platforms like smartphones have also given people greater control over where they use media. These developments and the difficulties they pose set the stage for any discussion of the merits of contemporary audience measurement systems.

The first problem results from the sheer proliferation of media. People now have hundreds of television channels and millions of websites from which to choose. Each of these options claims a bit of public attention that was once concentrated on just a few outlets. Nowhere have long-term trends in fragmentation been more evident than in the declining viewership of broadcast television. Figure 2.2 depicts the steady erosion of broadcast audiences in the United States. The dark bars show the combined primetime share of ABC, CBS, and NBC beginning with the 1985–1986 television season. In that year, the “Big-3” accounted for almost 70 percent of all the time American households spent watching television. By the 2008–2009 season, their combined market share had dropped below 30 percent—a “+7” number that included live and delayed viewing. Over the same span of years, the number of television channels available to the average household, as indicated by the ascending line, increased sevenfold. As of 2008, the last year for which Nielsen reported numbers, the average U.S. household could watch 130 channels. This change is not unique to the United States. As we noted in chapter 1, in just 10 years, India went from having 5 active television channels to 500. Filling up those channels are new broadcast networks and an avalanche of cable and satellite networks, each one claiming a piece of the audience.

Image

FIGURE 2.2. History of Audience Fragmentation: Network Shares and Average Channels per TVHH by year

Source: Data from Nielsen Media Research. Adapted from Webster (2005).

The result is that established outlets have seen their audiences shrink. In the early 1980s, primetime shows with ratings of less than 15 were routinely cancelled (Atkin & Litman, 1986). That is, unless you captured 15 percent of U.S. households, you could expect to be taken off the air. Today, while “media events” like World Cup Soccer or the Superbowl still attract large audiences, successful primetime television programs typically have audience ratings in the low single digits. Cable network ratings are often just a fraction of a ratings point. And most website audiences are smaller still.

Shrinking audiences present a problem for audience measurement, especially when you take into account the advertisers’ desire to reach even more narrowly defined segments of the audience. What would seem like very large samples are quickly pushed to their limit. For example, if you had a national panel of 20,000 television households, a program rating of 1 would mean 200 homes were watching that show. If the market the advertiser is after (e.g., young men) is present in only 10 percent of those homes, your program rating for that demographic could be based on just 20 individuals. Unless sample sizes keep pace with fragmentation, estimates of such tiny audiences can be swamped by sampling error, a topic we discuss in the following chapter.

The second problem developed more recently. Historically, radio and television have been “linear” delivery systems, with broadcasters controlling the schedule. Ratings companies could figure out what you were exposed to by keeping track of the station you used and when you used it. Now, “nonlinear” media like VOD, DVRs, and the World Wide Web allow people to watch what they want, when they want it. This presents still more challenges. To begin with, it exacerbates the problem of fragmentation. It is difficult enough to measure audiences when viewing is spread across 500 linear channels; now people have thousands of additional choices at their fingertips. Without fixed schedules to rely on, measurement companies have to find different ways to know exactly what programs and/or commercials are actually being watched. Moreover, time-shifting programs—and avoiding commercials in the process—complicates what used to be a straightforward metric. If measures of exposure are the currency used to transact business, what kind of exposure should it be? In 2007, the national currency in U.S. television shifted from program ratings to what are called “C3” ratings. The latter combines exposure to commercial minutes plus 3 days of delayed viewing.

The third problem is that media content moves across different technological platforms. Increasingly, people control not only when but also where media use occurs. Television can be seen on living room sets, monitors in public places, tablet computers, smartphones, or various websites (with or without ads). People can read news the old fashioned way or on any number of electronic devices. All of these points of contact might be occasions for marketers to reach potential customers and so are of increasing importance to advertisers. It would make sense to track each person’s exposure to a program or commercial across platforms. Unfortunately, most audience measurement operations have specialized in a single medium. One company would do radio but not television; another would measure Internet use but not print. And even when one company measured different media, their data collection was typically segregated into separate samples (e.g., one for television and one for the Internet). True “single source” panels that would measure all these activities precisely have often proved unduly burdensome on respondents or prohibitively expensive. Nonetheless, finding a way to track people across platforms remains an important challenge for audience measurement.

Measurement Strategies

In the face of all these challenges, how can any measurement operation hope to produce accurate, useful estimates of people’s media use? The first thing to note is that no system of audience measurement is perfect. Beginning with the earliest attempts to audit newspaper circulation, we have about 100 years of experience measuring audiences. In that period of time, no system of measurement has been without its flaws. The same will undoubtedly be true 100 years from now. Each approach has certain strengths and weaknesses. Our history of audience measurement highlighted at least some of those tradeoffs. With that in mind, we will briefly comment on the two basic measurement strategies currently in use. The first is old; the second is new. Each has distinct advantages and drawbacks.

Every audience measurement operation we described in this chapter depended on drawing a sample of people or households, securing their cooperation, and using that data to make inferences about the larger population. This is a user-centric measurement strategy. Information is collected from users who agreed to answer questions, fill out diaries, or accept meters. The big advantage of this strategy is that people willingly provide you with information about themselves. You can learn a lot about their individual characteristics. As a result, estimates of media use can be associated with demographics or whatever traits you have measured. But to provide accurate estimates, you have to construct an adequate sample.

There are two ways in which the sample can be problematic. The best samples are usually drawn at random. Unfortunately, not everyone you would like to study will cooperate. If the people who decline to participate are systematically different from those who do participate, you can produce biased estimates. The second problem, as we noted earlier, is that the sample just is not big enough to compensate for audience fragmentation. Of course, the ratings company could always increase the size of the sample. But that comes at a cost, one that ratings users will have to pay. Eventually, you reach a point of diminishing returns that make larger samples unrealistic.

The newer strategy avoids the problem of sample size. Much of the digital media we use is served to us through computers. These machines, or “servers,” can easily track what web pages are requested, what videos are streamed, and what ads are being served. They do not see just a sample, they see every such action. If you aggregate those actions, you have another way to measure the audience. This is a server-centric measurement strategy. And it has the potential to create a census of the population in real time. Having data on millions of users solves the problem of sample size.

But this approach has drawbacks as well. First, having lots of data does not mean you have a genuine census. For example, millions of homes have digital STBs, which are now being used to measure television audiences. But millions of other homes do not have STBs. As is the case with sample information, our concern is that those with boxes could be systematically different from those without, which would bias the estimates. More importantly, server-centric information often cannot tell you much about who is being served. Was the person visiting that web page a man or a woman? Was the person watching that video young or old? Server-centric information might not be able to tell you much about the characteristics for the audience. While there are ways to estimate those traits, user-centric approaches typically gather more precise information about individual users.

In practice, both strategies are now extensively used. Researchers have developed ways to compensate for the weaknesses of each, which we will describe in the next chapter. Increasingly, measurement companies are trying to “fuse” data from different sources to have the best of both worlds (see, for example, Nielsen, 2009, 2011).

But current audience measurement practices are not just about research methods. Often they are the combined result of what the methods make possible, what the clients want, and what they are willing to pay for. For example, reliable C3 ratings can only be produced if you measure audiences with meters. But peoplemeters were in place long before the United States made C3 the new “currency.” The change was ultimately a matter of ratings users reaching a new consensus. Internet measurement presents even greater challenges in consensus building. Servers collect so much information, they make many audience metrics possible (e.g., clicks, page views, unique visitors, impressions, etc.). While each has potential value, settling in on an agreed currency for transacting business can be difficult. Next, we consider some of the broader institutional factors that shape the audience measurement business.

Institutional Factors

While quality of audience ratings is important, that alone does not determine whether a measurement system will be adopted by the affected industries. As we have shown, there are many ways to measure audiences. Different methods can produce different results. Those differences often operate to the advantage of some ratings users and the disadvantage of others. These considerations, as well as the cost of implementing different methods, all affect the kinds of research services that are available in a given market. Here, we describe the blend of economic and institutional factors that affect the market for audience measurement. We then describe the ways in which different countries organize audience research.

Syndicated audience ratings reports are, like many forms of information, characterized by high “first copy” costs. The machinery needed for state-of-the-art measurement is expensive. Metering technology must be designed, tested, and manufactured. Systems for acquiring and processing the data must be in place. Investors and prospective clients have to be ready to make financial commitments. Whatever the costs, they must be incurred before the first report is produced. Further, the data provided by an incumbent ratings company are often deeply engrained in the institutional practices of advertisers and the media, making users resistant to change. These “facts of life” make it difficult for competitors to enter the market. As the World Federation of Advertisers (WFA) noted:

The money required for two adequate television panels would fund one good panel. It is also wasteful in terms of agency resources in that multiple data sets have to be purchased and reconciled. (2008, p. 5)

This means that audience ratings are often provided by one dominant firm and, once it is in place, it is not easily changed. Yet, as we have seen, change does occur. But how? It depends, in part, on the actions of the ratings companies themselves. But it also depends on the laws and institutional arrangements in different countries.

The most common way to manage change is through a “joint industry committee” (JIC). In such an arrangement, the principle ratings users (e.g., media, advertisers, and agencies) create a committee that specifies what audience measurement services are required. They put a contract, or tender, out for bids. Typically, established ratings companies like Nielsen, GfK Telecontrol, or Kantar Media will respond to the tender. In this system, differences of opinion about the merits of one method versus another are typically worked out “behind closed doors.” The JIC then selects the service it considers to be the best and agrees to fund the measurement operation for however many years the contract is in force. This basic model, in which users not only pay for but control measurement, varies from country to country. In the United Kingdom, for example, the BBC, ITV, and others created a nonprofit entity called the Broadcasters’ Audience Research Board (BARB). It commissions specialists, like Kantar Media, to provide television audience measurement services. OzTAM in Australia is similarly structured but uses Nielsen to run its peoplemeter panel. In 1985, the French created a company called Mediametrie, which is owned by French broadcasters and ad agencies. Rather than commissioning outsiders, it has developed its own measurement services. In any of these JIC arrangements, though, the result is a single firm providing a single currency. In fact, that is one of the reasons they exist (WFA, 2008).

In the United States, antitrust laws make JICs of questionable legality. While ratings services can seek accreditation from an industry board called the Media Rating Council, independently owned ratings companies have always been free to challenge one another for contracts with ratings users. Often they will do so by touting the superiority of some new research method. That was true when Hooper finally prevailed over CAB and it is true today. But neither change nor innovation comes easily.

As we have seen, dislodging an incumbent ratings company is a time-consuming, expensive, and risky proposition. Experience at providing ratings in another market, using arguably superior methods, and even attracting widespread encouragement from prospective clients does not guarantee success. That was true of AGB’s failed attempt to enter the U.S. market and SMART’s unsuccessful effort to challenge Nielsen. Philip Napoli, a business professor at Fordham, summarized the “reality” of the situation:

Early indications of support for an upstart do not always represent a genuine interest in competition or even a desire to have the new measurement firm supplant the incumbent as the agreed-upon standard. Rather, the apparent show of support for a new measurement firm actually is an effort to compel the incumbent to improve or alter its measurement techniques or technologies. Typically, the incumbent responds to this threat, and the advertiser or media firm stops underwriting the new firm. (2003, pp. 27–28)

While it may be hard for a newcomer to introduce innovations, it can be a challenge for even a well-established incumbent. As the pioneer in metered measurement, Nielsen is constantly devising new ways to collect data. But even making what might seem like uncontroversial improvements can meet powerful resistance. The best example of this is Nielsen’s efforts to introduce peoplemeters into local U.S. markets.

As we noted earlier, since the late 1980s peoplemeters have been the preferred method of television audience measurement around the world. Any reputable audience researcher would acknowledge that peoplemeters are more accurate than diaries, especially in the current digital media environment. Yet, when Nielsen wanted to replace a diary/meter method with peoplemeters in local markets, some Nielsen clients objected. A similar change happened nationally in 1987, and local broadcasters feared the new system might benefit cable. While advertisers and the cable industry were typically supportive, broadcasters were generally unhappy with the change. In Boston, the first test market, broadcasters initially refused to subscribe to the service. As Nielsen began rolling out local peoplemeters (LPMs) in other large markets, broadcast industry resistance escalated.

LPM technology was accused of undercounting minority viewers. If that were true, so the argument went, minority oriented programming would lose advertiser support and eventually be canceled. The most vocal and visible of the public interest groups making this case was called Don’t Count Us Out (DCUO), a collection of advocacy groups formed as the rollout began. Nielsen was surprised by the ferocity of the resistance. While it did have slightly higher “fault rates” in minority households, that problem was a relatively minor problem and easily fixed.

Less easy to fix was Nielsen’s public relations problem. It turned out that one of Nielsen’s own corporate clients was secretly behind the protests. News Corporation, the owner of Fox television stations, believed that LPMs would generate lower ratings for its stations. To slow or stop the introduction of LPMs, News Corporation organized DCUO, spent nearly $2 million orchestrating news conferences, running inflammatory ads, and operating telephone banks (Hernandez & Elliot, 2004). The supposed threat to minority interests attracted the attention of then-Senator Hillary Clinton and other national figures and organizations. This, in turn, prompted the U.S. Congress to hold hearings. Legislation was proposed but never passed into law.

Nielsen now has LPMs in more than two dozen of the largest U.S. markets, with plans for more. But the introduction of LPMs should serve as a cautionary tale to any incumbent ratings provider. Even if that firm is the sole supplier of a ratings currency, especially if it operates under a JIC contract, its ability to introduce changes in its methods for gathering and reporting data are limited. It functions within a complex web of competing industry interests and political agendas that are not always apparent. These can have a powerful effect on how the audience measurement business operates.

RELATED READINGS

Balnaves, M., O’Regan, T., & Goldsmith, B. (2011). Rating the audience: The business of media. London: Bloomsbury.

Bermejo, F. (2007). The internet audience: Constitution and measurement. New York: Lang.

Beville, H. M. (1988). Audience ratings: Radio, television, cable (Rev. ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Buzzard, K. S. (2012). Tracking the audience: The ratings industry from analog to digital. New York: Routledge.

Chappell, M. N., & Hooper, C. E. (1944). Radio audience measurement. New York: Stephen Daye.

Gunter, B. (2000). Media research methods: Measuring audiences, reactions and impact. London: Sage.

Kent, R. (ed.). (1994). Measuring media audiences. London: Routledge.

Lumley, F. H. (1934). Measurement in radio. Columbus, OH: The Ohio State University Press.

Napoli, P.M. (2003). Audience economics: Media institutions and the audience marketplace. New York: Columbia University Press.

Napoli, P.M. (2011). Audience evolution: New technologies and the transformation of media audiences. New York: Columbia University Press.

Turow, J. (2011). The daily you: How the new advertising industry is defining your identity and your worth. New Haven: Yale University Press.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset