CHAPTER 4

The Deliberative Justice Experiment

THERE IS NO SHORTAGE OF commentary about women’s presence in decision-making groups. And given the many studies that have accumulated on that topic, readers may wonder whether in fact they need this one. Surely by now we know that the presence of more women results in more participation and representation?

Well, actually, no. As we reviewed in chapter 1, studies of gender composition have come to an inconclusive end. For example, the most comprehensive study of women’s participation in meetings, Frank Bryan’s mammoth study of Vermont town meetings, found that the higher the percentage of women attenders, the lower is women’s share of speakers. Bryan left no rock unturned in Vermont; he examined every variable under the sun, including the most venerable variables known to cause political participation.1 Yet the explanation for women’s relative quiescence remains elusive. Studies in an entirely different continent yield similarly puzzling findings. As noted in chapter 1, the variables that typically elevate disadvantaged groups’ speech in Indian village meetings fail to elevate women’s speech. And so it goes with other studies—and whole literatures—on the topic of women’s participation and representation. There is still much to learn about women’s speech, and in particular when, and why, women’s numbers matter, and how institutions shape women’s voice and representation.

Part of the challenge of answering the lingering questions about women’s voice and authority is methodological. For example, many of the studies that take up the question of gender do not disentangle group gender composition from individual gender or do not involve sufficient variation in gender composition to allow for a full understanding of group-level factors. In particular, there are many ways that gender composition and decision rule could be correlated with other possible causes of women’s participation and substantive representation at a meeting. An observational study that tries to control on all these factors is likely to fail. Certain types of places or certain types of groups may, for example, be more or less likely to draw in women—or certain types of women—as participants and to shape the extent to which women exercise voice and authority. These tendencies complicate our ability to understand the effects of gender composition on its own. Similarly, certain types of groups may be more likely to choose unanimous decision rules, and this endogeneity will complicate any attempt to isolate the independent effects of institutional processes. In addition, the vast majority of attempts to study the characteristics of groups that matter for equality end up either with too few groups or inadequate measures of the causes, outcomes, discussions, or social interactions we are studying. In the chapter appendix we take up these issues in detail by cataloging a number of methodological gaps in existing studies.

WHAT WE DO IN OUR EXPERIMENT

We designed an experiment to remedy the shortcomings of previous studies. We wished to include a larger number of groups than the typical study has done. We wanted to assemble a dataset with many groups in each of several different gender compositions. Without enough variance on one of our key variables, we could not study its effects properly. We also wanted to avoid conflating gender composition, or decision rule, with other factors that may be correlated with them and that may also affect the outcomes we examine. Those factors include group size, the political context (more liberal or conservative), the level of women’s mobilization, the attitudes correlated with gender but not an essential part of gender itself, and more (as we elaborate below). In addition, we wanted to examine a wider range of outcomes from women’s presence and from decision rule than those we found in existing studies. Among those outcomes are measures of what actually goes on during deliberation, and what happens as a consequence of the conditions of discussion and the content of discussion. Therefore, we wanted to record in minute detail every utterance by every speaker and link it to the speaker’s attitudes and preferences before and after the discussion. A laboratory experiment offered a means to achieve these research aims.

This design would allow us to ask questions such as: how much do women and men speak? How much do women and men talk about issues of distinctive concern to women? Do men use interruptions to establish their status in the group? Do women use them to create a warmer tone of interaction in the group? Do women express their preferences during discussion? And how does what happens during the discussion affect the decisions the group ultimately makes? Finally, we needed postdeliberation measures to ask questions such as: How does all this affect women’s and men’s sense of their own worth, and sense of other members’ worth, after deliberation? Is women’s authority affected by gender composition and decision rule?

Finally, we wanted to be able to say that it is gender, and not something correlated with gender, that produces the results of gender composition. And that it is gender, and not something that women just happen to have more than, or less than, men, that is doing the work. To that end we needed a way to measure the person’s views before deliberation starts and see if the effects we observe really do fall out by the person’s gender and not by the person’s preferences. (In addition, measuring views before deliberation can tell us if women express these views as much as men.) We needed to assess the person’s political views and hold those constant when examining differences between men and women.

Since our goal is to home in on the individual’s gender and not end up spuriously studying the effects of factors that tend to go hand in hand with the person’s gender, we pause here to spell out the difference between these concepts. Gender is a constellation of factors that is not reducible to the substance of political attitudes or to incidental demographic characteristics that men may have more of, such as income, or that women may have more of, such as education or age. By political attitudes we mean a person’s level of liberalism or conservatism, and the kinds of attitudes that tend to feed into this political ideology, especially the person’s level of egalitarianism. A general adherence to, or rejection of, egalitarian values is particularly pertinent for our study, since our respondents were asked to consider such matters as what is fair in redistributing income in society (and among themselves), how much the well off should be taxed in order to help the poor, and what are the respective obligations of the individual to get along on their own and of the society to help those who cannot help themselves. These are all questions that are intimately linked to the opinions that our groups will discuss—the appropriate level of government spending, how much to tax and who should be taxed, and how much income inequality is acceptable or desirable (Bartels 2008; Feldman and Steenbergen 2001). For that reason, when we construct multivariate models, we add statistical controls for egalitarian attitudes in order to get at pure gender effects from being a woman versus a man. Where it makes sense to do so, we also control on the person’s predeliberation preferences about the specific principles of income redistribution that groups were instructed to discuss and decide, and whether this preference was in the majority for that group.2 Adding these controls helps us to separate out political attitudes and ideologies that are not core aspects of gender. As we elaborate in the next chapter, we will call these alternative hypotheses, respectively, Preferences and Efficiency.

In addition, we investigate whether demographic characteristics such as age, education, or income drive the results. As we will note throughout, we find that they do not, either at the individual or the group level. While these factors have not been experimentally manipulated, when we substitute the number of group members above the median income or age, or the number of college graduates, for our indicator of gender composition, we find that these demographic characteristics have no effect.3 With few exceptions, these indicators rarely even come close to statistical significance, either as additional controls or as substitutes for our measures of gender, so we have generally chosen to leave them out of our standard model in order to conserve statistical power. Most importantly, the results lead away from the conclusion that gender is a mere proxy for other demographic characteristics.

What that leaves us with is gender itself. At the level of the individual, and when it comes to stable characteristics of the person, we take gender to be a combination of the traits and attitudes that arise from gender socialization, such as the level of a person’s confidence, the person’s sensitivity to situations that send signals about one’s competence, aversion to conflict, the level of priority one places on social ties, and the effect of signals about the level of warmth and social connection in the group.

There is some overlap between what we consider correlated political attitudes on the one hand and the core aspects of gender on the other hand. Socialization into gender roles in part explains why women tend to seek a higher level of equality, to endorse more economically liberal policy, and to believe that it is government’s role to provide for those least well off in society. However, many men also adhere to these beliefs and pursue these preferences, and gender role socialization is not the main reason that people hold these beliefs. So while the total effects of gender do extend to political ideology and preferences, we will conduct a hard test of the effects of gender by bracketing the effects that work through these political attitudes and leaving only the remaining effects of gender socialization and gender roles.

In addition to ensuring that we are getting at the effects of gender per se, we want to be sure that the effects we attribute to gender composition, and to decision rule, could only be due to these and no others. That is the main reason we use a controlled experiment. Although our study requires an artificial setting to allow us to create controlled circumstances, its high internal validity is valuable despite the trade-off with external validity (McDermott 2011). Once we can be confident of the effects of gender composition under controlled conditions, we can then ask whether the results apply in more natural settings, as we do in the penultimate chapter.

The reader by now may be asking a simple but devastating question: how can you assign people to gender composition if you cannot assign gender to a person? Put differently, does this count as an experiment?4 Our general response to these questions is that yes, the design in this study incorporates the defining features of experimentation. In their classic discussion of experimentation, Kinder and Palfrey (1993) argue that experiments share an “interventionist spirit” in which researchers “intrude upon nature” in order to “provide answers to causal questions” (6). Morton and Williams (2010) agree, arguing that an experiment occurs “when a researcher intervenes in the data generating process (DGP) by purposely manipulating elements of the DGP,” where manipulating means “varies the elements of” (42). We varied the elements of the data generating process—specifically, the gender composition and decision rule for all groups in our sample. In addition, we use the hallmark of experiments as traditionally conceived: random assignment.5 However, because gender is not randomly assigned, we cannot treat gender composition as a purely randomized variable, and we must control on variables potentially correlated with it, as we explained above.

Finally, because ours is a study of gender inequality, we must decide how to measure that inequality. In most of what follows, we have in mind not the total participation or influence of men or women in the group, but participation or influence per capita. In other words, we examine the participation or influence of the average man or woman in the group. The problem with looking at the total (or in other words, combined or summed) female percentage of talk or influence is that rising numbers of women could increase women’s total proportion of talk or influence even if the average woman speaks far less than the average man.

For example, consider a hypothetical group containing four women and one man—that is, with 80% women—and imagine that the women collectively speak for 70% of the discussion. If we rely on women’s total percentage of discussion, then women appear to dominate the discussion, since together they take up a large majority of it. We would then conclude that the conditions that produced this percentage have created a large advantage for women over men. But when we look at the average woman’s percent of discussion, we come to the opposite conclusion. In the same hypothetical group, the average woman takes 17.5% of the conversation (that is, 70% divided by 4), while the man takes almost double that (30%). These rates yield a female/male ratio of 58%, indicating a high degree of gender inequality. In other words, the average in this example shows us that women actually participated much less than their share of members, and that their relative participation is actually far less than men’s. This is the definition of equality that we have in mind when we hypothesize about a gender gap in participation or influence.

Focusing on the average participation of men and women, respectively, also gives us a standard metric that allows us to see how the patterns of participation, including the gender gaps, change across different gender compositions. We can easily compare what women do in groups where women comprise 20% of the group to what they do in groups where women are 40%, 60%, or 80% of the discussants because we are adjusting for the random chance that more individuals produce more activity.

A focus on average or per capita participation is the approach we will pursue for most of the book. But this is not to say that patterns of total participation are meaningless or unhelpful, either normatively or empirically. When we turn to the question of group decision making in chapter 9, we will bring total participation back in as we explore the relationship between participation patterns, the content of discussions, and the group’s eventual choices. In explaining those choices, we will want to know whether average gender gap, the total gender gap, or both matter most.

To be clear, when we analyze participation or influence, whether total or average, our definition of equality is when participation or influence of a given gender is proportional to its numbers in the group. When it is not—and especially when the patterns of advantage or disadvantage are systematic, as we will show—then we will conclude that the standard of equality has been violated.

In sum, we designed our study to do the following: (1) generate a sufficient number of groups in various gender compositions to create adequate variance in this independent variable;6 (2) test for the predicted interactive effects of gender composition with decision rule; (3) use random assignment to create exogenous gender composition and decision rule variables so as to gauge their unbiased effects; (4) measure the actual level of speech participation of individuals and match it to their individual characteristics, including their gender; (5) measure, and control for, egalitarianism and other aspects of political ideology, both of the person and of the group; and (6) assess the effects for different types of women, and the effects on various types of outcomes (speech patterns, speech content, agenda setting, and influence), accurately measured.

THE DELIBERATIVE JUSTICE EXPERIMENT

Our experiment followed the basic procedure of a previous study by Frohlich and colleagues (Frohlich, Oppenheimer, and Eavy 1987; Frohlich and Oppenheimer 1990 and 1992). We told participants that they would be performing tasks to earn money and that the money received would be based on a group decision about redistribution, with the decision based on unanimous rule or majority rule.7 Prior to group deliberation they were not told the nature of the work.8 This uncertainty about the work task meant that participants did not know whether they were likely to earn a great deal of money or only a little, and therefore what their individual interests would be in a decision about income redistribution. Following Frohlich and colleagues, we instructed participants to make a collective decision that would apply not only concretely and immediately to themselves and their group but also hypothetically to society at large, so we could generalize beyond the lab situation to the decisions people make about redistribution in politics.

Between August 2007 and February 2009, students and community members were recruited, randomly assigned to one of the decision rule and gender composition conditions, and brought together for two hours. Potential participants were asked to take part in a two-hour experiment investigating “how people make decisions about important issues.” Recruitment was conducted through a wide variety of methods including e-mails to students,9 postcards to purchased random lists of community members, online advertisements, flyers posted on and off campus, and direct contact to local community groups.

The experiment occurred in three parts. In the first part, participants answered a pretreatment questionnaire about their social and political attitudes. (Question wording for key survey measures can be found in the chapter appendix.) After completing this initial questionnaire, participants were given a brief handbook that introduced them to some basic principles of redistribution that would be the focus of their group discussion later. The principles were as follows:

1.  Maximize the Floor: The lowest income in the group is raised to 80% of the group’s average. We told participants that maximizing the floor meant “giving those who have the least the most help possible.”

2.  No Taxes or Redistribution: Everyone gets exactly what they earn. Participants were told that this principle gives “everyone the greatest incentive to work hard and produce more” because with this principle, no redistribution would occur at all.

3.  Set a Floor Constraint: The group decides on an exact dollar amount to which the lowest incomes will be raised. The instructions for this principle emphasized the creation of a safety net “ensuring [that] everyone has enough to get by” and that the exact level of the safety net could be decided by the group.

4.  Set a Range Constraint: The group decides on an exact dollar amount that is the highest allowable difference between the highest income and the lowest income. With this principle, the instructions emphasized the goal of “reducing the extremes between rich and poor.”

For each principle, the participant handbook explained the values and purposes that motivate some people to prefer it as a method of redistribution. (See the online appendix for a full description of these principles and other instructions.) As they considered each principle, participants were instructed to “think about values that you hold. For example, think about how to promote equality of opportunity, how to reduce the gap between rich and poor, how best to provide for the poor, or how to reward talent and hard work.” They were also asked to think about how the principles would affect them, depending on the income they might ultimately earn: “Will you be happy keeping only what you earned in a low income situation? Will you be happy with a guaranteed minimum income? Will you be happy giving up your wages in a high income situation? Which principle is most fair or just for the group as a whole?”

During this initial period of instruction, we also gave participants examples of how the principle could be applied in the context of the experiment. For example, we explained that if a group were to choose a range constraint of $0, the group would be choosing to distribute the money earned by the group such that each member would earn the exact same amount. We also showed participants several different sample distributions of income and how choosing different principles would affect the way income was redistributed. After reading the material, each participant completed an eleven-question quiz to test their basic understanding of the principles. Participants who missed questions were given another opportunity to respond and were told the correct answer. This was to assure that all group members began the discussion with a roughly equal minimal level of understanding of the principles. Finally, participants were asked to privately disclose their preferences about how best to redistribute income or whether redistribution was an important goal at all.

In the second part of the experiment, participants were brought together in groups of five and were instructed to conduct a “full and open discussion” and to choose the “most just” principle of redistribution. They were asked to choose a principle that would be applied to money they and their group members earned during the experiment but that could also function as a principle for society at large. The only requirement was that they deliberate for at least five minutes; on average, the groups we analyzed spent just over twenty-five minutes (SD = 11) in discussion. All instructions, other than those pertaining to the specific decision rule to be used by the group, were exactly the same across conditions. During deliberation, each participant was recorded on a separate audio track, and the full conversation was also recorded on a master track that included all participants.10 This allowed us a precise and detailed account of what each person in the group said during group discussion. To avoid self-clustering in the deliberative area, participants were seated randomly around the table. The experimenter opened discussion by asking, “Would someone like to start by explaining which principle they believe to be most just and why?” The experimenter then remained in the room during the deliberation to manage the recording equipment and answer clarification questions about the distribution principles or other aspects of the process, but did not otherwise moderate or direct the discussion.

Participants appeared to take their deliberations seriously. Group deliberations typically extended well beyond the five-minute minimum, sometimes lasting as long as an hour or more. Consistent with the instructions, group discussions nearly always explored how group members’ choices about principles would work outside the experimental setting. The discussions touched on meaningful topics related to the redistribution, including the nature of equality, the needs of the poor, the importance of incentivizing work, the possibility of economic mobility, the fairness of various systems of taxation, and the value of charity. (A sample transcript is in the online appendix.)

For example, in a discussion on the East Coast, a female participant describes why she thinks a generous floor amount for the poor is important:

From my experience in American society though, the problem comes in that we—there are so many things you can’t predict, like you can’t predict how much money a family needs to survive. And if that’s the case, then there are people who are still living within pretty minimal surroundings, and other people who are living quite extravagantly, and they’re saying “well, this is a living income here.” But the living income so often doesn’t seem to be a living income for a family.

Later in the discussion, another participant explores the possibility of a range constraint and wonders out loud about the work ethic of the very rich:

The range constraint appeals to me because there’s such a disparity between the richest and the poorest, but it might flatten things too much in terms of giving people an incentive to work really hard, who are already up there. … I mean the implication might be that people who are poor don’t work really hard, and that’s just not true, or at least not in a lot of cases, and a lot of rich people don’t work hard. I mean it’s like you can’t assume that because somebody has a lot of money that they work hard, and because they don’t, they don’t work hard.

In a Western discussion, the group members similarly explore the balance between providing an incentive to work and meeting the needs of the poor. One participant expresses concern about “loopholes” that would allow the poor to receive income without sufficient effort:

But I just, I see a lot of loopholes, and with, like, other systems and things like when, when the poor are, like, blaming their situation in life that they’re born in, like, oh I’m poor and this, I’m not going to do much because I’m poor. If, you know, I just—I don’t know. I just have an issue with that kind of thing.

Another participant agrees that work is important, but reminds the group that in the larger society, it is also sometimes the case that individuals work hard but still face hardships:

No, I kind of agree. I think that there—like you said, there’s strong incentive, but … if there’s no floor or if there isn’t anything, … then if they don’t make enough to actually support themselves, that’s an issue too. And we’ve seen that today just with our society.

At this point, a third group member chimes in that many poor people work very hard, and a fourth explores whether the circumstances of one’s birth should have such a significant effect on one’s life chances:

I grew up in a family where we were taught that, you know, how hard you work is how much you get, but what I’ve learned is that’s not always the case. Like, depending on what circumstances you’re born into, like, it affects a ton of your life … I’ve just learned that, like, you can’t help where you’re born and that has such a huge effect on the rest of your life. And so it’s really not fair that … how hard you work should determine how much you get paid, like, some people just can’t help it.

As these examples show, the groups did much more than wonder about how they could maximize their pay in the context of the experiment. Often they explored the various principles of redistribution in great detail, thinking together about the concept of justice, the meaning of poverty, and how best to encourage self-sufficiency and also care for others. The conversations moved back and forth between the various principles they were considering, their own experiences in life, and the principles they wanted to encourage in society at large. As they worked through these issues, they expressed preferences for different principles and different levels of generosity for the safety net (poverty line), frequently exploring the implications of choosing different levels of support for the poor.

When the group members indicated they were ready to stop the conversation and vote on a preferred principle, the experimenter stepped in to conduct the formal vote, which occurred by secret ballot.11 If no principle of redistribution received sufficient support (either majority or unanimous, depending on the decision rule to which the group had been assigned), participants were told to return to discussion until they felt ready to vote again. Participants were instructed that if they were not able to agree on a principle of redistribution after four votes, the experimenters would assign a principle to the group. Groups were not told what the assigned principle would be, and no groups asked what principle would be chosen for them. Most importantly, all groups reached agreement, so experimenters never assigned a principle.

If the group chose either a range constraint or a floor constraint, they had one additional decision to make: what would the constraint be? Groups choosing these principles had to specify an exact dollar amount for those constraints and had to agree on these amounts according to their assigned decision rule. For example, if the group chose a floor constraint, they were asked to specify the income level below which no one would be allowed to fall. Because we asked them to think not only about the specific income they would be earning in the context of the experiment but also about rules that could be applied to society at large, we asked them to interpret the floor as “the minimum income a household is guaranteed each year.” In other words, groups choosing a floor amount had to specify what the poverty line in terms of an annual salary should be. In the instructions for deliberation, we asked groups to think about dollar amounts that could actually be used in society at large.

After deliberation came the third and final part of the experiment. Participants returned to private computer terminals and answered a set of questions about their assessments of the discussion, the group, its members, and their own experience. (Question wording for key survey measures is available in the chapter appendix.) They again had an opportunity to express their private preferences about principles. Participants then performed several rounds of “work”—correcting as many spelling errors in a block of difficult text as they could find within a two-minute time limit (replicating Frohlich and colleagues’ choice of procedure). Participants earned money according to their performance on the spelling task, and these earnings were distributed to group members according to their chosen distribution scheme. During the work period, earned incomes were given on a scale of annual incomes. At the end of the experiment, these were converted to a payment scale that ranged between $10 and $70. At the end of the task period, participants responded to a series of questions on attitudes and beliefs and were debriefed.

We use a 6 × 2 between-subjects experimental design, randomly assigning individuals to one of six gender compositions (that is, to a group that ranged from zero to five women) and one of two decision-rule conditions, unanimous rule or majority rule.12 Gender composition was randomly assigned to dates on the schedule of experimental sessions, and subjects who signed up to attend on that date were assigned to the corresponding gender composition condition. This ensured that group types did not cluster on particular days of the week and that participants had a roughly equal probability of being assigned to each group type. Thus each man or woman had the same probability of being assigned to a given gender composition. This satisfies the random assignment assumption, which is not that each treatment is equally likely to be assigned to a given person, but rather that each person is equally likely to be assigned to a treatment (Morton and Williams 2010). We recruited more than five participants for each session, and the alternates helped ensure that we could fill the day’s assigned type of gender composition. Randomization of decision rule was achieved by the roll of dice prior to each session. Randomization checks and propensity score analyses find that individuals were assigned by a random process and groups are equivalent on relevant covariates.13 In other words, we find that our groups are comparable in terms of their basic attitudes about politics and their demographic characteristics, such as income or education. Further details on the procedure, subjects, and other methodological matters are in the online appendix.

We have data on 470 individuals in ninety-four deliberating groups. Table 4.1 summarizes our experimental conditions and the number of groups and individuals in each condition. Although our statistical power is still somewhat limited, our research design includes a much larger sample of groups than typical in group research.

The experiment was conducted at two different sites—a small town near the mid-Atlantic coast and a medium-sized city in the Mountain West. Both locations include universities that recruit students from all different parts of the country. In the regressions we control on site because subjects are assigned randomly within but not across sites. As is common in controlled experiments, our goal was not a nationally representative sample but one with sufficient variance, and the sample did vary on relevant characteristics such as socioeconomic status and political attitudes, though our participants tend to be relatively highly educated, with most having at least some college experience (online appendix table B1).

Because our interest is in individuals and groups, we account for both, and we examine the data at the individual level and at the group level. We employ individual-level analysis in order to control on the characteristics of men and women that are correlated with gender but not core aspects of gender. As discussed earlier, those tend to be the person’s level of egalitarianism, and as needed, their liberalism, their prediscussion preferences over the group decision, and whether they are with the preference majority in their group. We also control on group-level characteristics, typically, the number of egalitarians. Because we can rely on the virtues of random assignment to condition, we sometimes report the raw experimental means without any controls, to give a basic sense of the effects. When we add individual- or group-level controls, we generally employ standard OLS regression, or probit with limited dependent variables, though we have also checked our results with other models, including, for example, both random- and fixed-effects models. Those alternative approaches yield results that are almost always identical to the results we report here, so we have opted for the most easily interpretable methods whenever possible. When we examine individuals rather than groups as the unit, we use robust standard errors clustered by the group. This sets the bar for finding statistically significant effects higher, but it has the virtue of adjusting for the fact that individuals share the experience of being in the same group and are therefore not fully independent of each other. We chose cluster robust standard errors over other approaches, such as hierarchical linear models, because clustering allows us to achieve roughly the same ends with fewer strong modeling assumptions. When we use group-level data—discussing what groups do or what a variable characterizing the group does—we use the number of groups to test for significance.14 Otherwise we use the number of individuals. When we report predicted values to show the effect of a causal variable, we estimate the values of the dependent variable from the model in each of our experimental conditions and allow the other variables in the model to take on their observed values.15 Descriptive statistics, scaling and wording for all variables, are in table B2 in the online appendix.16 Question wording for key survey measures can also be found in the appendix for this chapter.

Table 4.1: Experimental Conditions and Sample Size

image

EXTERNAL VALIDITY

The strength of experiments, of course, is their strong internal validity (Kinder and Palfrey 1993). But what about external validity? What does a lab study have to do with the natural world? We address a number of subsidiary questions on this theme.17

A fair question is whether it is worth the bother of studying gender compositions that do not occur much in the real world. For example, perhaps in today’s America, most groups that discuss matters of common concern have fairly balanced numbers of men and women. If most discussion groups are fairly balanced we do not need to know what happens in nonbalanced compositions. Perhaps we especially do not need to study homogenous gender groups. A different version of this question is whether there are too few groups where women are a majority to bother looking at fine distinctions between 60% female, 80% female, and 100% female. Our first answer is that we saw in chapters 2 and 3 that women and men are actively involved in organizations that hold meetings, and that the most common setting is one in which women make up about 20% of the participants. Still, groups vary widely in their gender composition, and there are even enough groups where men encounter only men, and women only women.

To further address this question, we can look at several additional sources. First, Bryan’s study of recent New England town meetings finds that the percentage of women ranged from approximately 18% to 70% of attenders.18 Second, a survey of American city councils provides some useful information. The average number of women on a city council is one. In addition, 24% of councils are all male (Nelson 2002). So in at least some settings of interest to us, the typical group is not evenly balanced and women are a small minority. Moreover, male enclaves are common enough to study. Third, further evidence comes from Hannagan and Larimer, who studied a random sample of appointed commissions in Iowa (2011a). In their sample, evenly balanced groups were a minority, and enough groups lay at the extremes of the gender composition distribution to make it worth our while to study the full distribution.19 As these numbers show, official committees range across the entire spectrum of gender composition. The real world contains enough groups with balanced numbers, with high or low gender compositions, and with male and female enclaves.

We can also see if the groups we studied under controlled conditions share some important features with natural groups. For one, we might ask about the length of deliberation. Our groups deliberated on average for about half an hour. In juries Gastil and colleagues studied, the median juror spent about one hour deliberating (2010). Similarly, state legislative hearings in Kathlene’s study lasted just over an hour (1994). The school board meetings we study in chapter 10 lasted on average about two and one-half hours. This is longer than our study’s median deliberation time, but not a qualitatively different experience such as one might find in, say, a full-day meeting.

A key feature of the group is size. We assembled groups of five members. The average size of US city councils is six members (Nelson 2002, 2–3). The average size of school boards is five to seven, as we document in chapter 10. Jury size is similar, as is the size of local boards Hannagan and Larimer (2011a) studied. We conclude that size is not a problem for external validity in our study.

Our experiment examined the behavior of non-Latino whites. We made this choice with difficulty. We do not intend to imply that studying nonwhites is less important. In fact, nonwhites do take an active part in civic groups. Black politics in particular is characterized by vibrant organizations, especially churches.20 Among the significant experiences that African Americans encounter in churches is activity in church committees (Brown and Brown 2003). The gender composition of these groups is well worth studying, especially since some findings suggest that women may not derive the same political participation boost as men do from church involvement (Robnett and Bany 2011). However, we have reason to believe that the gender dynamics in predominantly nonwhite groups may be sufficiently different that a rigorous study requires full attention to these groups. In other words, the research question as applied to nonwhites requires a full replicate of nonwhite groups, something we cannot do with our existing resources.

Moreover, we find that many official boards and other groups are not as integrated along lines of race or ethnicity as we would prefer. This is an unfortunate but real consequence of the continued high levels of segregation and underrepresentation of people of color in official decision-making situations. There are very few nonwhites on official deliberating groups. Town council members are overwhelmingly white (as well as either lawyers or business people) (Nelson 2002). To be exact, approximately 6% of council members are nonwhite (calculated from Nelson 2002, 3, table 4). School board members are also overwhelmingly white (see chapter 10). We will be in a position to comment on the dynamics of school boards with some nonwhite members, and on racial dialogue groups that contain nonwhites, but since these samples of nonwhites are extremely small, our conclusions about gender composition cannot be applied with enough confidence to nonwhite members.

Our experimental sample is varied in age and education. We made substantial efforts to recruit from the towns at large. But nevertheless, our sample is heavily laden with young people and college students, and the lowest levels of educational achievement (those with less than a high school education) are not well represented.21 We take seriously the concerns about relying on students (Sears 1986; but see Druckman and Kam 2011). We are reassured, however, by several considerations. As we noted in chapter 2, the problem of women’s underparticipation affects young women and college students no less than others. The gender gap is not going away. In fact, the college setting is a revealing place in which to look at why young women are not leveraging their greater educational achievements. And as we indicated above, we also replicate our main findings with a control for age, education, and income. Furthermore, we will address head on the concern that our findings may not apply to people at middle age or beyond when we analyze school boards.

An additional possible concern about our experiment is its procedure. We are asking people to make a decision about redistributing resources that they earn in a study. What does that have to do with politics? Our answer is that the decision was explicitly framed as one that would apply to society. And our attempt is to go beyond the typical survey questions about preferences on political issues to understand how citizens puzzle through these issues in a collective setting. In this sense, our experimental approach was closer to the sorts of actual decisions people make in public meetings than an individual-level survey can achieve. At real-world meetings, people are often speaking to decisions that directly affect them. At local council or town meetings the decisions have a direct bearing on the well-being of those who show up to comment. The town council decides which neighborhood’s garbage gets picked up and how often. The zoning council decides if the local land developer can build ten stories or three, netting more or less profit for him and obscuring the views of the neighbors to a greater or lesser degree. Not all these decisions involve concrete resources, but many do.

School boards, whose meetings we study in chapter 10, make just such decisions. How much to tax the population of our school district? How much do we spend on the needs of disabled or poor children in our district? Or, take the case of civic associations and their meetings. Again, these meetings are likely to revolve around issues such as how much to ask different church members to contribute to the church and whether there should be a sliding scale for these fees. The PTA must decide whether to fund the eighth grade trip to Washington, DC, or staff the playground before school hours to help care for kids whose parents work (Eliasoph 1998). These are the types of decisions we wish to analyze. Asking people abstract questions about their support or opposition to assisting the poor seems a more remote way to assess the types of decisions that people make in their meetings. At the same time, we want the decision to reflect more than the immediate and relatively trivial matter of the study’s take-home pay; this is why we asked them to make the decision as if it applied to society at large. Again, we can lay possible concerns to rest in our chapter on school boards, by replicating the core results with a sample of actual groups making real decisions.

In addition, we are reassured that some of our basic findings are replicated elsewhere. As we mentioned earlier, Bryan found that despite nearly equal attendance rates, women speak less than men. To be exact, Bryan found that the percentage of female attenders who speak at least once (34%) is about two-thirds the percentage of male attenders who speak at least once (52%). This is similar to the findings we shall report in the next chapter. It is also similar to the results we report in our replication chapter on school boards. (We shall have more to say about external validity in the concluding chapter.)

As we are about to detail, our experimental evidence reveals pervasive gender gaps in various dimensions of women’s voice. We present these results with some reassurance that they are not merely products of an odd setting, unusual participants, or strange procedures. The control we can exercise with our experiment is thus useful in its payoff and not too demanding in its cost. And just as importantly, our experimental approach allows us real insight into how the gender gaps can change by altering the institutional features of the group.

 


1 He examined the town’s Democratic vote share, its vote for Vermont’s ERA, its turnout levels, other measures of town’s liberalism/conservatism, the town’s size, its political economy and women’s place in it, and so on.

2 The online appendix (http://press.princeton.edu/titles/10402.html) provides the question wording and response options we used to measure egalitarian attitudes.

3 Neither demographic characteristics nor political preferences have been experimentally manipulated, the implications of which we discuss in the online appendix.

4 An extended discussion of this question can be found in the online appendix section on methodology.

5 Specifically, our approach uses what Cynthia Farrar, Don Green, and their colleagues call a “passive” experimental design—one that randomly assigns individuals to the discussion group based on their demographic, ideological, or other preexisting characteristics, and observes the outcomes (Farrar et al. 2009, 617–18). While individual gender cannot be manipulated, a group’s gender composition can be. So, too, can other features of the group, such as the decision rule.

6 We stratified by gender to avoid a balanced gender mix in most groups.

7 A control condition of no deliberation was included for other purposes.

8 In the Frohlich and Oppenheimer study, this was meant to simulate the Rawlsian veil of ignorance, designed to prompt people to consider principles of justice.

9 At the northeastern university, student e-mails were those of volunteers for previous experiments in their lab, and later to the entire student body. At the western university, several random samples of the entire student body were obtained and used.

10 Our software measures the participation of each member precisely; see the online appendix for further details.

11 All group members had to agree that they were ready to end the conversation.

12 Because experimental sessions were run over an extended period of time, there is no correlation between gender composition type and the day of the week or the time of the session. Typically, we conducted one experimental session at each site per day, and sessions started at similar times of the day. If fewer than five participants showed up, the session was canceled and participants could sign up for subsequent sessions. See the online appendix for additional details.

13 Demographics such as education, income, age, partisanship, and student status had no significant relationship with gender composition and rule. We performed three tests on each set of propensity scores: a two-sample t-test, a Wilcoxon-Mann-Whitney test, and a Kolmogorov-Smirnov test. Full details of our randomization checks are available in the online appendix.

14 When we have a clear directional hypothesis, we often employ one-tailed tests of statistical significance. In other cases, where our expectations are less clear or our hypotheses are more speculative, we employ a two-tailed test.

15 This procedure yields very similar results to alternative approaches in which we hold the other values in the model constant at their means or medians.

16 For the sake of completeness, we report the sample means for all our variables, but many of these variables are affected by the experimental conditions themselves, as predicted. So the overall sample mean is not substantively meaningful. For example, the overall gender gap in talk time is the mean across very different experimental conditions, and large gender gaps in some conditions are canceled out by small or nonexistent gaps in others. In addition, some of these experimental conditions—such as majority-rule groups with a minority of women—are far more common outside the lab setting than others, and the means for those more common settings should be used if one wishes to generalize from our sample to the population at large.

17 How far and exactly to which types of people and situations we can generalize is a question addressed in the conclusion chapter. Here we address the question of whether our experiment is sufficiently realistic to apply to natural settings.

18 The distribution of percentage of women attenders is normal with a center at 47% (Plot 1); this average percentage has held steady over the course of the 1990s (figure 2). This tells us that at least in New England town halls, most meetings are evenly balanced, but that there are enough groups both under and over this balance point that they are worth studying. Figure X-E at http://www.uvm.edu/~fbryan/newfig%20X-E.pdf; http://www.uvm.edu/~fbryan/Chap_IX_links.html.

19 See Hannagan and Larimer 2011a, table 1. They found roughly the following (with a membership from four to eleven):

• All-male: three groups (10%)

• Small minority female (under 33%): seven groups (23%)

• Even split: five groups (17%)

• Large minority female (around 40%): six groups (20%)

• Majority female: eight groups (27%)

• All-female: one group (3%)

20 Calhoun-Brown 1996; Harris-Lacewell 2004; Harris 1999; McClerking and McDaniel 2005; Patillo-McCoy 1998.

21 Fifty-two percent of our sample are students.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset