Introduction

‘Educationalists’ do not emerge from the womb fully formed. Like the learners they study, they are learners themselves. So, the invitation to select my ‘best’ works to show the development of my ideas over more than thirty years as a university researcher and academic – or over forty years if my ten years as a school teacher is added – faced me with a dilemma. As one matures, one tends to believe that ones’ best work comes later in a career, as the ideas are increasingly refined. Yet earlier work is important to show the germination of such ideas and initial struggles with them.

The problem of deciding criteria for such judgements of worth and choice is a familiar one in the field of assessment research and scholarship, and I deal with it substantively in this volume (see Chapter 3). With regard to my own dilemma, however, I decided to select papers from all stages in my career: from the time when I was a research assistant at the Open University to my present role as President of the British Educational Research Association. I used three criteria. First, the papers should, in my judgement, be worth reading. Second, they should demonstrate significant steps in my work, often associated with key projects (see Table 0.1). And third, they should reflect developments in the field as a whole. This last is deeply connected with changes in the education policy context.

Reading my own papers again was a strange experience. Although I thought I knew what was in all of them, I found myself reading some of my early work almost as another person. I was sometimes surprised by what I discovered. Somewhat disconcertingly, I found that there were, on the one hand, positions that I no longer hold, or not as strongly. For instance, in my early work, I was methodologically committed to qualitative, naturalistic, interpretative inquiry. Now I am far less purist and see the need for multi-method, inter-disciplinary, integrated approaches to the investigation of complex, multi-layered problems. This shift could be explained in a number of ways, partly as a response to a shift in the field of educational research as a whole and, specifically, to a shift in patterns of funding, and partly as a result of my own experience of working on a progressively widening canvas with a much larger range of researchers holding multiple perspectives – and acquiring new skills.

On the other hand, my re-reading of my early work has led me to see where my thinking has not changed, or at least where certain concerns and orientations have persisted. For example, my interest in assessment, evaluation and research has always been motivated by a concern to investigate and promote the contribution of the processes associated with them for the benefit of learning, development and innovation. This is what makes them educational. Insofar as I have also been interested to engage with the policy context, this is because of its powerful influence in constraining or affording such educational development.

This central interest in what makes, or can make, assessment, evaluation and research educational, has led me to deliberate on problems of explanation: what models, or theories, of learning, innovation and change provide adequate accounts of the features, processes and outcomes of these phenomena? This theme appears in both the first and last papers in this collection, and others in between. However, my answers to these questions develop and change over time.

I have chosen to divide the book into three parts. Each part includes a set of papers clustered around a broad theme. The parts, and the papers within, are presented more or less chronologically in terms of publication although there is some overlap. This has enabled me both to present a sequence in the development of my thinking, but, perhaps more importantly, to make transparent some developments in the field of educational research, policy and practice over three decades, with special reference to evaluation and assessment.

Before I outline the contents of the papers, I should say something about the words ‘assessment’ and ‘evaluation’ because I am aware that UK and US usage differs. In the UK the term ‘assessment’ is widely used for all those activities that involve eliciting evidence of student learning and drawing inferences as a basis for decisions. In the US, these processes are often referred to as ‘measurement’ (for the collection of evidence) and ‘student evaluation’ (for the drawing of inferences and making judgements). However, in the UK, and elsewhere, the term ‘evaluation’ is more often used for procedures for collecting evidence and judging the worth of educational programmes and institutions. It is not often used in the context of judging the performance of students. For the purposes of this book, I have adopted UK usage.

Part 1, with its focus on Educational Evaluation, has reference therefore to the evaluation of educational programmes and schools. It contains four papers relating to work conducted mainly at the Open University in the 1980s. The decade earlier had witnessed a flowering of curriculum innovation stimulated by funding from the Schools Council and the Nuffield Foundation. As a school teacher I had direct experience of these, having taught both the Schools Council’s Humanities Curriculum project, directed by Lawrence Stenhouse, and the Schools Council York General Studies project. Most of these projects had evaluations attached to them.

The challenges of designing evaluations of innovations in non-traditional fields of study stimulated a good deal of debate, especially around the validity and appropriateness of traditional outcome measures for detecting and evaluating less tangible outcomes in areas such as the teaching of controversial issues or moral education. In 1977, an influential group of evaluators published a seminal book, Beyond the Numbers Game (Hamilton et al., 1977), which contained a manifesto arguing for ‘illuminative evaluation’, grounded in a social-anthropological approach, with the capacity to describe and interpret the ‘instructional system’ and ‘learning milieu’ in a way considered crucial to judgement and decision making. This book stimulated considerable debate within and beyond this group of evaluators for many years to come. The discourse featured strongly in the ‘learning milieu’ that I encountered when, as a serving schoolteacher, I took a part-time higher degree, supervised by Helen Simons, at the Institute of Education, London. A little later, it influenced my contribution, as a research assistant, to the Open University Course E364, Curriculum Evaluation and Assessment in Educational institutions.

Helen Simons, a colleague of Barry MacDonald’s on the evaluation of the Humanities Curriculum Project, had begun to develop his notion of democratic evaluation in the context of school self-evaluation. The idea that evaluation could be extended beyond projects, and applied to schools as a whole, had also begun to emerge in the late 1970s, partly as a response to the William Tyndale School debacle and the ensuing Auld Inquiry. The Auld Report laid much of the blame on the Local Education Authority (LEA) for not paying sufficient regard to what was happening in the school. As a result, a number of LEAs, led by the Inner London Education Authority, quickly produced self-evaluation schemes by which they expected schools to account for their provision, procedures and outcomes. Many of these had the appearance of bureaucratic measures, which stimulated Simons, and others, to propose alternatives better informed by evaluation theory.

This, then, is some of the background to the two papers that constitute Chapters 1 and 2 in this book. Chapter 1 is my attempt to clarify the debate specifically about school self-evaluation and to tease out some of the key conceptual and theoretical issues. It draws on the work of some of the most influential writers of the time – Simons, MacDonald, Elliott, Hoyle, Sockett and Stenhouse. But, most importantly for the present collection, it begins my exploration of the tensions between evaluation for development and evaluation for accountability: whether these tensions can be resolved and whether the two purposes can be accommodated within a single system. It also references another recurring theme – to do with underpinning assumptions regarding models of effective change.

Chapter 2 attempts to put some empirical flesh on these conceptual bones by reproducing a paper reporting an investigation of emerging practice of self- evaluation in schools. An analysis of the evidence led to a tentative typology of what I termed ‘authority-based’ or ‘responsibility-based’ school self-evaluation – resonant with MacDonald’s classification of ‘bureaucratic’ and ‘democratic’ approaches to project evaluation. The debate continues to this day about whether imposed or school-controlled self-evaluations are more effective in bringing about school improvement.

One of the interesting features of this paper, from my position today, is that the evidence I was offered then also contained examples of student self-assessment schemes. At the time I paid little attention to these, but the reference indicates early movement towards student involvement in their own assessment.

Although broadly sympathetic to the ‘new’ paradigm of evaluation that emphasised description, portrayal and ‘illumination’, I remained troubled about the lack of attention to the act of judging itself, especially who should do it and on what basis. Evaluation is about the act of ascribing value and the formulation of criteria for judging worth is crucial. However, in much of education, deciding what constitutes success is not straightforward. Chapter 3 reproduces a paper that discusses this issue by exploring aspects of Personal, Social and Health Education. It identifies three different approaches to generating criteria based on whether the focus of the programme is on improving defined student outcomes (outcome criteria), improving student experiences (process criteria) or improving the environment for learning and action (context criteria). Each has different theoretical and methodological implications and raises questions about whether they can be combined. The conclusion here, as in other chapters, is that ‘fitness for purpose’ is the overriding consideration.

The final paper in this section, Chapter 4, is a reflection on the outcomes of a major national evaluation of pilot Records of Achievement (RoA) schemes that I was involved in from 1985 to 1990. Although the then Conservative government funded these schemes, they arose out of much grassroots activity in LEAs and schools during the early part of the decade. Although one goal was the development of a tangible record, to celebrate achievement across a wide spectrum of learning activity, RoA schemes also paid attention to recording processes, involving teachers and students in dialogue, and to ways of enhancing the quality of assessments through moderation by teachers. In this sense, the RoA schemes were highly significant and can be seen as the direct precursor of much of the work on formative assessment and assessment for learning that followed (and is dealt with in Part 2 of this book).

It was this experience that eventually encouraged me to shift the main focus of my work from the methodology of school and programme evaluation to the substantive issues of assessment for student learning. It seemed to me that many of the concepts and approaches readily transferred, and the thinking I had been doing about evaluation and models of change had parallels in tackling problems in the relationship between assessment and theories of learning.

Chapter 4 does not, however, deal with these particular issues in depth. Rather, it focuses on how the evaluation that we conducted was received and the extent to which it became the victim of changes in political priorities. The Conservative government stayed in power throughout 1985–1990, but Keith Joseph, the Secretary of State for Education at the inception of the schemes, was succeeded by three others – Kenneth Baker, John MacGregor and Kenneth Clarke – before a decision was made on a national policy for RoA. By that time the original intentions had been so watered down, and the lessons learned by schools so misunderstood, that all that remained was the ‘wine list’ – a decision that all 16 year olds should leave school with a National Record of Achievement (NRA) document in a maroon plastic folder. Soon that disappeared too, as priority was given to the introduction of National Curriculum Assessment (NCA). The relevance of the paper for today is to reinforce MacDonald’s notion that, whilst evaluations will always be vulnerable to rapid changes in the political environment, they can, as long as reports are placed in the public domain, contribute to an ‘informed citizenry’. We were told that our findings had informed the Zeitgeist, and that cannot be undone.

Recently I experienced a distinct sense of déjà vu as I observed the reception of the 2011 ‘Expert Panel’ report on a framework for a new national curriculum (DFE, 2011). I was commissioned, with Tim Oates, Andrew Pollard and Dylan William, to support the Coalition Government’s review of the National Curriculum in England through the exercise of our individual and collective insight and knowledge, especially in relation to international and domestic research and evidence. On publication of our report, the Secretary of State, Michael Gove, thanked us for our efforts and initiated a further period of consultation. But, at the time of writing, it looks as if few of our recommendations will be implemented. It may be some time before we are in a position to reflect properly on events. However, in the light of my previous experience with the evaluation of RoA schemes, I knew it would be important to ensure that background documentation is in the public domain.1 Even if evaluations become of little interest, or too challenging, to those who commission them, the full details should be made available to those who ultimately pay for them (tax payers), to professionals and colleagues who might find them worthy, and to citizens who elect their representatives.

In Part 2 of this book, attention passes to my work on assessment and learning, which occupied much of my research time in the following two decades. This is particularly associated with my work at the University of Cambridge Faculty of Education and my membership of the UK Assessment Reform Group (ARG) from 1992 to 2010, when the ARG retired.

Chapter 5 derives from the time when I was working as an evaluator of the RoA schemes and records and analyses evidence of some of the tensions between formative and assessment purposes in classroom assessment processes. Very like the tensions in school self- evaluations, between development and accountability, the nature of the processes, and the degree of control exercised by the participants over them, was crucial. Central to that control was the nature of the verbal interaction. If the objective was to produce a public account then a process of negotiation was appropriate; but exploratory dialogue was more appropriate if learning and understanding was the goal.

The terms ‘formative’ and ‘summative’ have been in the discourse at least since the time when Scriven published The Methodology of Evaluation in 1967. These terms have been applied to the purposes of assessment ever since. However, in the UK, these categories have been somewhat replaced by ‘assessment for learning’ and ‘assessment of learning’ following the publication, in 1999, of the ARG’s booklet Assessment for Learning: beyond the black box. This reformulation was a deliberate attempt to choose terms with a more self-evident meaning – that teachers, policy-makers and the public might more readily understand.

Chapter 6 is included in this collection because it reproduces the first half of the paper that, as a result of his searches, Jim Popham (2011: 273) believes to be the first public naming and articulation, in 1992, of Assessment for Learning (AfL). Flattering as it may be to have a term that has become so influential attributed to me, it was the teacher Paul Barraclough, mentioned in the paper, who used it first. He gave it as the title of the case study he wrote of his work with Gary. Like many other researchers in similar circumstances, I merely: noted it; thought it captured a phenomenon well; then began using it.2

Chapter 7, from my book Using Assessment for School Improvement, refers both to formative assessment/assessment for learning and summative assessment/assessment of learning and develops the distinctions and relationships much more fully. I had the benefit of access to the draft of Black and Wiliam’s (1998) seminal review of research on assessment and classroom learning, so I was able to draw on their findings in writing my own book, published in the same year. This chapter also provides evidence of the beginning of my interest in trying to interrogate assessment practices for the implicit or explicit theories of learning that underpin them. I judged that if AfL was genuinely to contribute to learning then it had to respond to our current best understanding of how people learn.

My experience of working with the RoA schemes in the 1980s had taught me that, in addition to the interactions between teachers and students in classroom assessment processes, the moderation of judgements based on evidence is crucial to enhancing their dependability and hence their credibility. With the introduction, in 1990, of an element of teacher assessment in National Curriculum Assessment arrangements, especially at Key Stage 1 (for 7 year olds), the need to develop systems for quality assurance in assessment became urgent. Chapter 8 describes work carried out in the six counties of the East Anglian region to develop such systems based on group moderation by teachers supported by LEA advisers. This was innovative and offered real possibilities for development and implementation across the system as a whole. Unfortunately all this got swept aside by the introduction of more statutory tests, or the provision of optional tests for teachers to use, which had an enormous backwash effect. Partly this was in response to teacher demand because they became anxious about ‘making the grade’ in a high stakes environment where their own reputations (and jobs), and those of their schools increasingly depend on results and league table positions. But partly this was a response to teachers’ concerns that teacher assessment and systems for moderation meant a higher workload, rather than providing a valuable professional development opportunity that would benefit their students’ learning and outcomes.

The rapidity of changes in policy, and the ever greater pressure for results to meet government targets, had the effect of transforming the NC Assessment system as originally envisaged – to meet diagnostic, formative, summative and evaluative purposes – into one where evaluative purposes dominated, reinforced by an increasingly draconian inspection regime. This was justified by Government on the basis that higher test results represented better learning, despite commentators arguing that this causal link was not established by research, especially research on the validity of assessments. Chapter 9 traces these changes over a decade, and the effects they were having on student experience. It anticipates the ‘levelling off’ of the rise in performance scores and argues that, in the future, teachers will need to work ‘smarter not harder’ if real improvements in learning are to be achieved. In this paper also, assessment for learning, as part of pedagogy, is felt to hold the key.

The final chapter in this section, Chapter 10, was written much later but it returns to the theme introduced in Chapter 7, to do with the relationship between assessment and underpinning assumptions about theories of learning. This was the third of a series of papers developing this theme, and especially my concept of Third Generation Assessment, grounded in a sociocultural perspective on learning. This model contrasts with assessment based on behaviourist assumptions of learning and performance (First Generation), or assessment based on purely cognitivist approaches (Second Generation) that fail to recognise sufficiently the social character of learning practices. This chapter traces the intellectual roots of these ideas and also, in this version, provides examples, from a US high school, and a UK infants’ school, of what Third Generation Assessment might look like in practice.

Third Generation Assessment holds potential for greater authenticity and validity by its ‘situatedness’, but this characteristic demands that the assessors are close to the action. The closest analogy is with apprenticeship in which the ‘master’ and ‘guild’ have a central role in accrediting learning and accomplishment. This is challenging for systems of mass education where external testing has become the norm; it demands an alternative based on self- and peer-assessment, teacher assessment and group moderation. However, earlier chapters in this book provide pointers to how this might work. An OECD study of Queensland (Sebba and Maxwell, 2005), where (to date) there is no state-mandated testing, also provides evidence of how this might successfully be implemented. However, the Queensland case study also shows that, without serious attention to the professional development for teachers, the problems will remain. Policy makers, in particular, need to be convinced that substantial investment in PD for teachers will give returns in terms of improved teaching and better learning outcomes.

Part Three of this book draws on my most recent research, conducted both at the University of Cambridge and the London Institute of Education, between 2001 and the present time. Most of this has been associated with the Teaching and Learning Research Programme (TLRP), which was the largest programme of educational research ever funded in the UK, involving over 100 investments and 700+ researchers across the four countries of the UK, and establishing many international links. It had twin goals: to investigate how the learning of all learners across the life course might be improved; and to build research capacity across the UK. From 2002 to 2007 I was Deputy Director of the TLRP as a whole. But I was also privileged to be Director of one of the biggest school-based projects within the Programme: ‘Learning How to Learn in classrooms, schools and networks (LHTL)’.

The LHTL project has produced more than 50 publications. Given its scope and complexity, it is difficult to provide a succinct summary of its aims, design, methods and findings. However, in Chapter 11, I have re-edited two papers to provide a reasonably brief and accessible account of the project, and the implications for policy and practice that arise from it. The project started from the premise that assessment for learning (AfL) practices are basically tools for promoting learning how to learn (LHTL), with the ultimate goal of creating learners who are autonomous and well equipped to deal with the demands of living and working in the fast-changing global environment of the twenty first century. The project team knew from previous studies that, although AfL held huge promise, implementation was difficult and ‘scaling up’ the innovation across a whole system would be extremely challenging. Therefore, the intention was to investigate the conditions within and across schools that would enable fundamental changes in assessment and pedagogy to take root, become embedded and spread across classrooms, schools and networks. By involving a large team of researchers with different specialisms we were able to draw on different skills, knowledge and experience. As a consequence, team meetings became a learning environment for us all. We found that the processes of learning how to learn are much the same for students, for teachers, for schools and for those who support and research them.

Chapter 12 shifts focus on to the TLRP as a whole and is an edited version of a very long paper that pulls together what the Director, Andrew Pollard, and I distilled from a review of all projects and cross-programme thematic initiatives. My specific responsibility was for school-based projects, so this paper, and the one from which it derives, focuses on findings and implications for the schools sector.

Although the projects within the programme were all expected to ‘sign up’ to a common set of broad aims, their specific research questions, designs, methodologies and theoretical perspectives were not prescribed from the centre. Neither did the TLRP Directors’ Team have much role in selecting them because they were chosen by a steering committee of various stakeholders using standard Economic and Social Research Council procedures for judging the quality of individual bids. Therefore, TLRP projects were diverse in size, focus and approach, and their outcomes were equally multifarious. As Andrew Pollard said on a number of occasions, the task of making sense of what TLRP had achieved as a whole was like trying to make sense and effective use of a hand of cards that one had been dealt. The instigation of a number of cross-programme thematic initiatives aided this process considerably; when it came to writing a summarising account, of which Chapter 12 is a version, the work of these thematic groups was drawn on substantially.

The key decision however was not to present an account of findings with a view to adoption, but to present the knowledge and insights that TLRP researchers had gained as a set of principles (rather like those that the ARG had developed earlier – ARG, 2002). Principles are capable of acknowledging the importance of contextualised judgement by teachers in effective implementation. Principles also provide an organisation that allows the knowledge that TLRP produced to be refined and built on in resilient, realistic and practical ways. The ten principles cluster in four broad areas that reflect the multilayered nature of innovation in pedagogy: (1) educational values and purposes; (2) curriculum, pedagogy and assessment; (3) personal and social processes and relationships; (4) teachers and policies. Chapter 12, sets out the rationale, development, evidence and argument for them.

The recognition, inherent in the formulation of TLRP’s ten principles, that educational research produces, not ‘blue prints’ for adoption, but hypotheses to be tested further in the particular contexts of individual classrooms and schools, takes us straight back to Lawrence Stenhouse. Indeed Stenhouse is referenced in Chapter 12. It is significant, therefore, that I have chosen to conclude this collection with a paper revisiting the work of Lawrence Stenhouse in relation to curriculum research and development.

Chapter 13, reproduces a paper I was asked to write as a reflection on the enduring quality of Stenhouse’s ideas, for publication around the thirtieth anniversary of his death, in 1982. The paper that is Chapter 1 in this book was first published in that same year, and reveals how far I was influenced by his ideas at that time. The connection goes back ten years earlier however, to the time when, as a schoolteacher, I was ‘trained’ to teach HCP (the Humanities Curriculum Project). This was probably the most powerful professional development experience of my life and its impact has remained with me ever since.

Re-reading his work, and particularly his seminal book, An Introduction to Curriculum Research and Development, I was struck by how much of what Stenhouse wrote was, inevitably perhaps, a direct response to the ‘state of the art’ at the time, especially developments in the USA concerning teaching defined by behavioural objectives. Some of his ideas now seem dated because events have moved on. However, it is worth reflecting on whether we would be in a very different place now if educational trends had not been subjected to his incisive critique and his creative proposals for alternatives.

In Chapter 13, I examine particularly the relevance for today of his ideas in relation to the themes in which I am most interested – curriculum, pedagogy and assessment. What I did not realise in earlier reading was the extent to which Vygotsky (in addition to Dewey and Bruner) was an inspiration to Stenhouse. He was developing, in the UK, what one might call a sociocultural theory of education long before it acquired the currency that it now has. For this reason, and for others, Stenhouse has remained a key influence on my thinking throughout my own career in education, both as a classroom teacher and as an educational researcher.

This returning to my roots seems an appropriate way to round off this collection of my ‘works’. It has been a huge honour to be asked to produce such a volume, and I hope that my selection will be worthy of the accolade associated with the series. More importantly, I hope it will be of some value to a new generation of educators and researchers.

At the time of writing this introduction, I have reached the age of sixty-six and have no plans, or desire, to get involved in any more projects of the scale of those that have consumed me in recent years. I realise that many of my respected colleagues remain active and productive until well past seventy. But apart from a few small pieces of ‘unfinished business’, I hope to find more time for other things. (Is there a gender dimension here?). When the novelist Hilary Mantel was asked recently, whether, at the age of sixty, she had any plans to retire, she said, ‘I have no retirement date in mind, but I hope I have the grace to stop before I begin to repeat my best effects. I look forward to the end of ambition and the end of desire, in the way my grandmother looked forward to a cup of tea after a tiring day.’ (Guardian Review, 2 June 2012: 2)

I think I am about ready for that nice cup of tea.

Notes

1 See: http://www.bera.ac.uk/content/background-michael-gove%E2%80%99s-response-report-expert-panel-national-curriculum-review-england. Accessed 14 August 2012.

2 In 1986, Harry Black wrote a chapter in Nuttall (1986) entitled ‘Assessment for Learning’. In this he argued that new moves towards criterion-referenced assessment might inform the development of in-school assessment policy for diagnostic purposes.

References

ARG (1999) Assessment for Learning: Beyond the black box. (Cambridge, UK: University of Cambridge School of Education)

ARG (2002) Assessment for Learning: 10 Principles. (Cambridge, UK: University of Cambridge Faculty of Education)

Black, H. (1986) ‘ Assessment for Learning ’. In D. Nuttall (ed) Assessing Education Achievement. (London: Falmer Press ): 7 – 18.

Black, P. and Wiliam, D. (1998) ‘ Assessment and classroom learning’, Assessment in Education: Principles, policy and practice, 5(1): 7 – 73.

DFE (2011) The Framework for the National Curriculum: a report of the Expert Panel for the National Curriculum review. (London: Department for Education) (Authored by M. James, T. Oates, A. Pollard and D. Wiliam).

Hamilton, D., Jenkins, D., King, C., MacDonald, B. and Parlett, M. (eds) (1977) Beyond the Numbers Game. (London: Macmillan).

Popham, J. (2011) Classroom Assessment: What teachers need to know (Sixth Edition). (Boston: Pearson)

Sebba, J. and Maxwell, G. (2005) ‘ Queensland, Australia: an outcomes-based curriculum’, Formative Assessment: Improving Learning in Secondary Classrooms. (Paris: OECD).

Scriven, M. (1967) The Methodology of Evaluation. AERA Monograph Series on Curriculum Evaluation, No. 1: 39 – 89. ( Chicago: Rand McNally)

Stenhouse, L. (1975) An Introduction to Curriculum Research and Development. (London: Heinemann)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset