Chapter 8
Systems for Moderation by Teachers to Enhance the Quality of Assessment

Edited version of James, M. (1994) ‘Experience of quality assurance at key stage one’. In W. Harlen, (ed.) Enhancing Quality in Assessment (London: Paul Chapman): 116–138.

Introduction

This chapter explores experience of moderation and audit in the context of National Curriculum Assessment (NCA) in England and Wales during the period from the publication of the report of the Task Group on Assessment and Testing (DES/WO, 1988a) to the publication of the final report of the government-commissioned review of National Curriculum and assessment carried out by Sir Ron Dearing in 1993 (Dearing, 1994). For reasons partly to do with the phased introduction of the National Curriculum, partly to do with the different characteristics of education at the various Key Stages, and partly, one suspects, for political reasons, the quality assurance structures and procedures that had been tried out by the time of the Dearing Review exhibited considerable differences across Key Stages. At Key Stage 1 (KS1) a system of ‘moderation’ involving teams of visiting moderators appointed by local education authorities (LEAs), backed up by local assessment training, was established in 1991. At KS3, however, a very different system of ‘quality audit’ was introduced in 1992, though it was effectively ‘put on hold’ by the teachers’ boycott in 1993. ‘Quality audit’ required schools to send samples of test scripts and other evidence of assessments to GCSE examining groups, as designated audit agencies, for ‘audit’ and ‘endorsement’. By the beginning of 1994 the issue of moderation or audit at KS2 had still to be resolved and the Dearing Report (1994) recommended no decision until the new School Curriculum and Assessment Authority (SCAA) had carried out a further review of the options.

Thus the most extensive experience of quality assurance procedures in relation to NCA was, by the end of 1993, to be found in KS1 infants’ schools and classrooms. Therefore, although quality assurance in NCA is the general context, this chapter takes as its particular focus an examination of the accumulating evidence of experience at Key Stage 1. In this sense it is a specific case study of KS1 nested within the broader case of NCA. But it is more specific still, because the themes and issues discussed have particular reference to evidence from schools and LEAs in the East Anglian region of England between 1990 and 1993. Whilst this might seem to locate it within a specific time and place, the issues raised have much general interest. These studies are the only independent research specifically concerned with moderation practice in the National Curriculum, available at the current time. Remarkably, given the concern for quality in assessment, this area has even been neglected by Her Majesty’s Inspectors (HMI). Their report on assessment, recording and reporting at Key Stages 1, 2 and 3 for 1992/3 had only very mixed and inconclusive observations to offer on quality control and the audit process based on ‘some second- hand evidence’.

The audit process was not formally inspected this year, but HMI obtained some second- hand evidence through discussions with teachers. The extent and quality of the audit process appeared to be uneven. Many schools had visits of varying duration and it appears that different auditors interpreted their duties differently. For some schools, it was apparent that the audit involved careful scrutiny of a selection of pupils’ standard tasks and teacher-assessed work. Evidence from other teachers indicated that the audit consisted of a brief discussion of how things had gone, a look at the pupils’ results and a query from the auditor as to whether there were any problems. If the audit process is to continue, there is a need for more consistency of approach.

(HMI, 1994, para. 12)

The evidence available from our studies is much more robust than this and leads us to somewhat different conclusions, notably that, although substantial problems remain, schools and local authorities in the East Anglian region, at least, are moving towards greater consistency of approach.

In order to put these findings into perspective, this chapter begins with a short description of the research which informs it. This is followed by a brief resumé of relevant elements of national policy and a discussion of the assumptions inherent in central arrangements and requirements for assessment and moderation. This is followed by an account of the implementation of policy in schools and some of the issues that have arisen at the level of practice. Throughout, special attention is given to dilemmas that arise from demands that exceed the available resource: how to meet needs for both specificity and comprehensiveness in the assessment training of teachers and moderators; how to marry the functions of audit and support in the role of the moderator; how to meet the dual criteria for validity and reliability of assessments and how to determine whose is the responsibility for each; how to balance accountability and development purposes and associated mechanisms for quality control on the one hand and quality assurance on the other. Some explanations for what emerges are offered in relation to the influence of existing professional cultures in the development of strategies for assuring assessment quality. Consequently, the attempt to introduce new systems is equivalent to bringing about cultural change and must, therefore, be framed according to a timescale and with attention to the human dimensions that cultural change entails. To regard putting in place a quality assurance system as little more than a technical or bureaucratic exercise involving the design of structures, the delineation of roles and the distribution of resources is to court the kind of reaction that led to the teachers’ boycott in 1993.

In terms of TGAT’s original classification of moderation options, three years’ experience and a national review appear to have done little to provide a clear way forward and the confusion continues to be rooted in a tension between quality assurance of assessment processes and quality control of assessment results. The chapter concludes with a discussion of the possibility of incorporating a greater degree of quality control within quality assurance systems with the twin aims of supporting professional and educational developments and gaining the confidence of a sceptical wider community in assessment outcomes.

The research

The following description and analysis of experience in schools is based largely on the outcomes of three linked evaluation projects carried out at the University of Cambridge Institute of Education, on behalf of LEAs in East Anglia from 1990. The first study was an evaluation of KS1 assessment training in Bedfordshire LEA in 1990/1 during the first (supposedly ‘unreported’) run of NCA in the core subjects (English, mathematics and science) for all children in Year 2. The research involved observation of assessment training sessions and the conduct of the 1991 assessments in schools supported by interviews with LEA advisers, moderators and teachers. The results of this study were written up in a report to the LEA (Conner and James, 1991).

The second study involved four LEAs (Essex, Hertfordshire, Norfolk and Suffolk) and focused specifically on moderation and the obligation placed upon LEAs by the then School Examination and Assessment Council (SEAC) to promote consistent standards of assessment within and across LEAs. The data were again collected through observation of training sessions but also by ‘shadowing’ moderators as they carried out their tasks. Accompanying moderators on their visits to schools also provided opportunities to observe and talk with teachers. The outcomes of this study were made available in the form of a report to the LEAs (James and Conner, 1992) and a journal article (James and Conner, 1993).

The third study continued to monitor assessment practice in schools and LEAs in the light of the 1992/3 Assessment Order for KS1 and to extend the analyses generated by the research undertaken in 1991/2. This time, however, the group of LEAs was extended to six (Bedfordshire, Cambridgeshire, Essex, Hertfordshire, Norfolk and Suffolk). In order to counteract any bias that might have been introduced in the earlier study by shadowing moderators and taking LEA arrangements as a starting point, a decision was taken to make case studies of schools (two or three schools in each LEA) the primary focus of this research. This change in approach was also a response to the shift in emphasis introduced by the Department for Education (DfE Circular 12/92) which gave headteachers a statutory duty to see that their school’s assessment standards conformed to national standards. The report of this study was presented to the LEAs at the end of September 1993 (Conner and James, 1993).

The national policy context

On the basis of the TGAT Report (DES/WO, 1988a) the then Department of Education and Science (DES) began to put in place a national assessment system with the following espoused characteristics:

  • serving several purposes — formative, diagnostic, summative and evaluative;
  • combining moderated teacher assessment (TA) with standard assessment tasks or tests (SATs);
  • aggregated and reported at the end of Key Stages — at the ages of seven, eleven, fourteen and sixteen;
  • criterion-referenced — in relation to attainment targets (ATs);
  • based on a progressive ten-level scale for attainment in each of the subjects with Level 1 being the lowest.

TGAT recognised many of the difficulties inherent in relation to each of these dimensions but managed to persuade the Secretary of State that this was a viable framework in all but one respect – moderation. In devoting four pages of its main report to the issue. TGAT recognised the importance of quality assurance in assessment and saw moderation as having a key role in this. Moderation, according to TGAT, had twin functions: ‘to communicate general standards to individual assessors and to control aberrations from general standards by appropriate adjustments’ (DES/WO, 1988a, para. 68). In other words, both a training function and an audit function were recognised at this early stage. TGAT went on to examine the options before making a judgement about the most appropriate system of moderation for national assessment. It outlined the pros and cons of reference tests, moderation by inspection and group moderation before recommending the last on the basis of its ‘communication’ potential to enhance the professional judgement of teachers and to allow their judgements to inform the development of the National Curriculum.

In view of what has happened subsequently it is worth going back to the TGAT Report to look at what was said about each of these alternatives. With respect to reference tests, it was thought that SATs could be used as reference tests against which to scale teachers’ ratings. This would be economical but it would require the tests to reflect precisely the same features as the teachers’ ratings and would almost inevitably encourage ‘teaching to the test’ since the test alone would determine the level of reported attainment. Moderation by inspection would similarly emphasise external control and would suffer from incompleteness of data because visiting moderators usually have access only to tangible outcomes of tasks, or processes by special arrangements which can introduce an element of artificiality. TGAT’s preferred group moderation, based on practice in GCE and CSE examinations at sixteen-plus, emphasised the development of collective judgement through discussion and exchange of samples of children’s work at meetings. The ‘pattern-matching technique’ described by TGAT in some detail emphasised the function of the moderation meetings as scrutinising differences between teacher ratings and scores on standard tasks (STs) and between the distribution of pupils’ scores in a class or school with LEA and national distributions. ‘The general aim would be to adjust the overall teacher rating results to match the overall results of the national tests’ (DES/WO, 1988a, para. 74) unless there was a good reason not to do so in which case these reasons would be reported to the LEA and to SEAC. This almost exclusive emphasis on harmonising school results with national distributions is curious, given the commitment to criterion- referencing, although it is understandable because the only existing experience was with norm-referenced systems. Little attention at this time was given to the function of group moderation in interpreting criteria and in establishing the kind of performance that would count as demonstration of attainment, although the TGAT supplementary reports (DES/WO, 1988b, p. 5) mentioned the use of assessed tasks to serve as national standards. Of course, the moment it proposed a ten-level scale, with expectations that, for instance, the majority of seven-year-olds would attain Level 2, a normative dimension was introduced, which supports the view that you only have to scratch the surface of criterion-referenced assessment to find norm-referenced assessment lurking beneath (Angoff, 1974, cited in Wiliam, 1993; Black, Harlen and Orgee, 1984).

In June 1988, when announcing his acceptance of most of the TGAT Report, the Secretary of State specifically rejected TGAT’s recommendations for moderation on the basis that they were too ‘complicated and costly’ and that there was insufficient control to ‘safeguard standards’. According to Daugherty (1994), ministers adopted the convenient tactic of ‘wait and see what emerges from developments’ because they had no coherent alternative to offer at this time. It therefore fell to SEAC to come up with something different. In 1989 it proposed a model for the moderation of KS1 assessment, with which it was most concerned at the time, based on the establishment of local moderating groups (though not groups of teachers as TGAT had proposed). These would receive and compare TA and ST scores although ST scores were to be ‘preferred’ for reporting purposes. If a teacher queried the ST score then a local moderation procedure (which became known as ‘reconciliation’) would be invoked probably involving an external moderator who would visit schools to look at the evidence. Not surprisingly, since it is difficult to imagine who else might possess the structures, experience and personnel needed to work with up to 20,000 infants’ and primary schools, SEAC proposed that local education authorities should manage the local moderating groups and appoint local moderators. Daugherty notes that, in this ‘rather different method’, group moderation by teachers had effectively been replaced by ‘modera-tion by inspection’. He also claims that the greatest significance in these new proposals should be attached to the shift away from TGAT’s conception of moderation as a professional process of teacher and curriculum development, as well as control, towards moderation as an administrative process concerned primarily with bureaucratic regulation of assessment practices. Groups of teachers meeting together still figured in SEAC’s plans but these were to have a role in training teachers for national assessment and agreement trialling.

Understandably, given its growing interest in reducing the role and power of the LEAs, the DES was cautious in its acceptance of SEAC’s model but, since no other feasible alternative readily presented itself, these proposals became the basis of the moderation system that has subsequently evolved for KS1.

Experience in schools

From the perspective of most schools, consistency of judgements with national standards was, initially at least, of lesser interest than the need to be fair to the children and to promote their learning. Schools inevitably had a very local perspective sometimes not extending even to the LEA, let alone the nation. Whilst some teachers have in the course of time revised their perspectives, particularly after the school has received children from elsewhere with assessed levels with which they disagree, or when middle-class parents have put pressure on them to achieve quantities of Level 3s, they have remained first and foremost teachers, not examiners. Therefore inter-LEA consistency has not really been ‘their’ issue, which some might see as unfortunate since the administration and marking of national assessments has depended on them. Their stated interests have been primarily in the area of formative and diagnostic assessment (the first two of TGAT’s purposes) for its potential to assist them with their teaching although the teachers’ conceptions of formative assessment were mostly unsophisticated and articulated more with curriculum coverage than with any coherent theory of how children learn and how assessment might contribute. Despite these espoused interests, they have, however, been constrained to concentrate on the summative elements by the sheer scope of the STs and the complexity of the recording system. Consequently our evidence suggests that Year 2 teachers were in two minds about the 1993 teachers’ boycott and the Dearing Review that it stimulated. On the one hand they wanted a simpler system; on the other hand and they recognised that they had learned a great deal about assessment and they did not want all their new learning to be devalued or lost.

Over the three years covered by our studies we have witnessed a shift in focus from an almost exclusive concern with the mechanisms of administering the STs and recording results towards wider concerns associated with teacher assessment, the collection of evidence, the sharing of judgements and whole school policy. I have little doubt that this is attributable to the influence and training provided by the LEAs. On the other hand, the STs still feature largely and dominate the thinking and practice of teachers in the latter half of Year 2. Some teachers have said that they have learned things about their children from the STs but, in our experience, most have felt that STs merely confirmed what they already knew. Thus the formative potential of this aspect of national assessment has been limited. This is as might be expected because early on in the development of the system the roles of ST and TA appeared to become separated with SEAC ascribing the formative purpose to teacher assessment and the summative purpose to STs.

Although many of the public statements from central agencies implied that the STs themselves were unproblematic, teachers and moderators have been much exercised by issues of validity and reliability arising from them, although they rarely used these terms themselves. For this reason much of the moderation effort, within schools in agreement trials and by the external moderator, has been directed towards moderating the STs. (In TGAT’s proposals it was the teacher assessments, not the STs, that were expected to require moderation!) Reliability has been threatened in two particular ways: by the scope available for variations in the presentation of tasks and by the scope for interpreting the assessment criteria in different ways.

As far as consistency in the presentation of tasks was concerned we observed variation in the interpretation by teachers of the guidance offered by SEAC and the LEAs, variation in the presentation of tasks between teachers and schools, and variation by some teachers in the mode of presentation from one group or individual child to another. In the 1992 study we noticed a teacher changing the format of her presentation in the light of her experience of using the material and as a result of the responses of the children. She said, ‘I usually get the hang of it by the third time I’ve done it, after that it gets boring. . . . I don’t know if it affects the children’s reactions.’ In 1993, when many of the STs came in a ‘pencil- and-paper’ format, the scope for such variation should have been reduced but, as one teacher pointed out, the options available to administer the tests either to groups or individuals probably advantaged or disadvantaged different children in different ways.

Problems about reliability in terms of consistent application of criteria for judgements of attainment were also evident. These stemmed chiefly from the scope for interpretation in the statements of attainment. For example, ‘spell correctly, in the course of their own writing, simple monosyllabic words they use regularly which observe common patterns’ (English: spelling: Level 2) permitted a range of interpretation around the qualifiers ‘own’, ‘simple’, ‘regularly’ and ‘common’ (see also Wiliam, 1993). Some of these problems could have been overcome if criteria had been clarified in such a way as to make them unambiguous. However, to do this in the context of the National Curriculum attainment targets would either have led to a vast increase in the already too numerous statements of attainment or to trivialisation of the learning tasks. Some revision of the statements of attainment by SEAC during the period 1991 to 1993 indicated that it was grappling with this issue but the solution that it came up with at that time appeared to rest on selected criteria which would be key to the judgements about whether a child had achieved a particular level in key attainment targets. Thus whether or not a child had been ‘told’ less than eight words in a 100-word passage became critical to attainment of Level 2 in reading; similarly, whether or not a child had punctuated two sentences with full stops and capital letters became critical to attainment of Level 2 in writing. Whilst these reasonably unambiguous criteria undoubtedly enhanced the reliability of teachers’ judgements on the STs, teachers were unhappy about what this did to the validity of assessments, which they articulated in terms of ‘fairness’ to children.

If one takes the example of the writing attainment target, many teachers and moderators encountered examples of children’s writing that fulfilled all the other stated criteria concerning story structure, the sequencing of events and the introduction of characters – also some criteria that were not required at Level 2, concerning the quality of ideas and the use of vocabulary – yet these children could not be assessed as Level 2 because they had not used punctuation as required. This seemed ‘unfair’ especially when children could read their stories with intonation that indicated an understanding of sentences. What teachers were querying was the conception of ‘writing’ as defined by the criteria. In other words they were questioning the ‘construct validity’ of the assessment. Feeling that they were not in a position to influence changes in the definition of criteria, a certain amount of ‘teaching to the test’ was inevitable; evidence for this in 1993 was provided by the appearance in children’s writing of full stops of golfball proportions.

Although some of these problems seemed little nearer resolution than they were four years earlier, the fact that teachers were aware of them says something about the way in which their understanding of assessment had increased. This is all the more significant in that infants’ teachers, unlike secondary teachers, had very little experience with formal assessment before the introduction of national assessment. Year 2 teachers may now be among the most sophisticated assessment practitioners. There is evidence that this understanding is, at last, transferring from an obsession with STs to a consideration of teacher assessment in general and to school assessment policy in particular. It would be wrong to suggest that development is uniform across all schools. Our evidence suggests that schools are still at very different stages in understanding and practice depending on development priorities within the schools and commitment to assessment by senior managers. However, in East Anglian schools, development appears to be happening in certain directions encouraged by the LEAs. A number of common features are becoming evident.

Firstly, there is increasing recognition of the need to collect evidence of children’s performance in order to support judgements. The tick-lists of statements of attainment that so characterised the early forms of record-keeping, though not entirely supplanted, are being supplemented or replaced by portfolios of work for individual children. Teachers are still anxious about the amount of evidence they need to collect but LEAs have mostly recommended a slim portfolio containing a limited number of fully annotated pieces of evidence (perhaps one for each attainment target) to illustrate that the processes of assessment have been carried out competently and to provide material for discussion of judgements in agreement trials and with parents.

Secondly, whether or not they can be described as agreement trials, teachers have begun to meet more regularly for debate about judgements in relation to children’s work. These meetings occurred within schools but also, sometimes, across schools in school cluster meetings or in LEA training. Despite growing familiarity with this kind of procedure teachers nevertheless experienced difficulty in challenging the judgements of others. In giving an account of such a meeting, one teacher said: ‘Nobody wanted to say that they disagreed, especially when they thought that one of their colleagues has assessed too highly. Teachers aren’t like that, are they?’ The reasons for this difficulty therefore appear to be cultural. Teachers are easily threatened, especially at the present moment in history, and avoid situations that make them more vulnerable. Acceptance of the need for critical examination of judgement in a public forum will entail a certain amount of cultural change which inevitably takes time.

Thirdly, in order to provide some consistency in the tasks presented to children and to focus their discussions of standards, teachers were beginning to create resource banks of assessment tasks, including some of the ‘better’ STs, to use as a normal part of their teaching and learning and as a basis for some of their teacher assessments.

Fourthly, in addition to individual portfolios, schools were being encouraged by LEAs to develop school portfolios of evidence of children’s work agreed at the various levels. These would relate both to the material produced by SEAC, entitled Children’s Work Assessed, and to equivalent collections that were being put together at LEA level. These portfolios act as a reference for teachers but also as a way of communicating standards to parents. By comparing their own child’s work with the school portfolio parents are able to draw conclusions about the progress their child is making.

Together these components of emerging practice at school level, with the addition of school visits by external moderators to support schools in the process of making their judgements and to contribute to the developments of consistent approaches, might form the basis of a viable quality assurance system. The question is whether it would satisfy the demands for accountability at national level and from the wider community. In a letter to us (dated 29 July 1993) in response to our sending him a copy of the interim report of our 1993 study, Sir Ron Dearing indicated that a professional quality assurance model, on its own, would probably be insufficient:

I was interested to read your reference to tension between support for a professional model of quality assurance and a bureaucratic model of quality control. I agree with you that polarisation of this kind is too stark. Quality assurance is a highly desirable approach, but it is far from easy to achieve across the totality of a system as huge and as diverse as schools, and from what I have experienced elsewhere, it does not dispose of the need for an element of quality control to verify that the systems of quality assurance are delivering acceptable standards.

(Dearing, 1993a)

The implications of the Dearing Review

In April 1993 Sir Ron Dearing, the appointed Chair of the new body SCAA, was asked by John Patten, the Secretary of State, to conduct an urgent review of the National Curriculum and its assessment. In July he presented an interim report (Dearing, 1993b) and in December a final report (Dearing, 1994). Both were accepted by government in their entirety and there was much in both reports that was well received by policy-makers and professionals alike. However, the proposals for promoting quality in national assessment were interesting, not least because of the substantive differences between the two reports and the questions left unanswered – or even unasked.

There was in Dearing’s proposals no suggestion that national tests (STs) should be dispensed with but that they should be limited to the core subjects of English and mathematics at KS1, at least in 1994, with less time apportioned to them. According to the interim report, they should also be slimmed down by reducing any diagnostic and formative element that they presently contain (Dearing, 1993b, para. 5.23). On a number of occasions, Dearing stated that the national tests are to be exclusively summative and teacher assessment principally formative although teacher assessment can be expected to contribute to summative assessment in science at the end of KS1 and to other summative assessments at other points in a Key Stage. However, Dearing was also concerned that teacher assessment should have ‘equal standing’ with national tests and recommended that the two sets of ratings ‘should be shown separately in all forms of reporting and in school prospectuses’ (Dearing, 1993b, para. 5.28). This suggests that teacher assessment will, in a significant way, have a summative purpose even at the ends of Key Stages, which begs a question about how the two sets of ratings are expected to relate to each other. Dearing used several paragraphs in this interim report to address this issue and his observations are worth quoting:

Reference has been made to teacher assessment. It has often been argued during the Review that this is a more valid and more efficient way of gathering summative data on pupil and school performance. It can take place continuously through the year or at times which are well-fitted to stages of learning. In that sense, it is said to be more efficient because the tests disrupt normal teaching and consume scarce financial resources.

But teacher assessment needs to be moderated if parents and teachers across the system, for example in receiving schools, can be confident about the standard being applied. The monitoring of national performance or of performance across a locality depends upon information related to a national standard. . . . Effective moderation is therefore necessary. Moderation, by peer groups of teachers or by external audit, with schools’ procedures and the outcomes evaluated, perhaps through school inspections, has however, its own significant opportunity costs. It cannot readily produce the same consistency of standards as national tests. The statutory tests are, therefore, a valuable means of moderating teacher assessment. If well conceived and well conducted, they also provide reliable information related to a national standard.

(Dearing, 1993, paras 5.13 and 5.14)

It appeared that what Dearing was proposing, at this time, was that the combination of ‘group moderation by teachers’ and ‘moderation by external audit’, which was effectively the basis of the models emerging in the LEAs and schools that we have studied, should be supplemented if not replaced by a form of moderation by ‘reference test’ on the grounds of cost and reliability. This was a cause of some concern because, for reasons outlined earlier, ‘moderation by reference test’ was the least favoured of TGAT’s (1988a) original three options.

Interestingly, Dearing’s final report made no further mention of this particular proposal and one might infer that the consultation on the interim report raised significant objections to this idea. Instead, the final report indicates a ‘pulling back’ from any firm recommendation pending the results of a number of projects commissioned by SCAA to develop systems of quality assurance in and across schools. It appeared that the work carried out in some LEAs over the past three years was being ignored.

All that Dearing was prepared to say at this point on the moderation of teacher assessment was that:

One of the possibilities currently being canvassed is that OFSTED and the Office of Her Majesty’s Chief Inspector of Schools in Wales should contribute to the moderation of teacher assessment during their four-yearly inspections of schools (five-yearly in Wales). On the assumption that schools in a locality come together to form groups to moderate their assessments and that these groups are large enough for one school in the group to be inspected each year, this might enable each school in the group to benefit from regular external advice about standards of assessment.

(Dearing, 1994, para. 9.2)

On a more positive note, Dearing had clearly appreciated the value of a number of the developments outlined in this chapter and recognised the need for SCAA to continue to support moderation by teachers by the production of exemplar material (Dearing, 1994, para. 9.4) and the provision of ‘high quality standard test and task material which can be used flexibly, and on a voluntary basis’ (Dearing, 1994, para. 9.5). He also commended the growing practice of keeping ‘a small number of samples of work for each child to demonstrate progress and attainment’ (individual children’s portfolios) as well as the development of school portfolios to assist the process of audit moderation (Dearing, 1994, Appendix 6).

Taken together, Dearing’s two reports convey a somewhat confused picture of what moderation procedures might look like in the future. They appear to make reference to all three TGAT options: moderation by reference test, moderation by inspection and group moderation by teachers. What is clear, however, is that the emphasis has shifted away from quality assurance of assessment processes and towards quality control of assessment outcomes. As Dearing put it, ‘The purpose of audit-moderation is to verify the accuracy of the assessment judgements made by the school and promote consistency between schools’ (Dearing, 1994, Appendix 6, para. 25).

Quality control within quality assurance

Dearing was correct, of course, in his observation that current models of moderation, as exemplified in our research, are costly. They require the provision of extensive training of teachers, and the development of materials, procedures and practices which have been unknown to most schools hitherto. However, they are only excessively costly if such elaborate systems serve only to produce accurate assessment results, that is to fulfil the quality control function. If this were the case then they would not have enjoyed the kind of support among teachers that our evidence indicates. The reason for advocating such models is not that they can turn teachers into good examiners but that it can help them become better teachers. Many teachers believed that they had developed professionally over the last three years and that they had acquired skills of observing and analysing children’s learning and refining their judgements in relation to both assessment and curriculum. If the quality assurance systems described above are seen in these terms, then they are delivering much more than dependable assessment results and the costs can be attributed to a much wider range of educational purposes. Undoubtedly the achievement of these purposes needs to be improved but if the quality assurance system is an appropriate vehicle then it may well be worth the investment.

The question of whether a professional model of quality assurance can deliver dependable assessment results which will be credible to a sceptical public is, in some ways, more difficult to answer. It is possible to argue that reliable outcomes will flow ipso facto from the putting in place of sound assessment processes and procedures. Thus quality assurance focusing on assessment processes and tasks should guarantee the quality of assessment results. On the other hand, there is little hard evidence that this idealised view has ever been fully realised in practice. What may be preferable, therefore, is the conceptualisation of a quality assurance system that, instead of being distinguished from quality control, actually incorporates it. This could be a way of confronting the problem of professional quality assurance systems being necessary but not sufficient to engender confidence in the quality of assessment outcomes.

There is a sense in which the East Anglian LEAs were already going along this road by the provision and encouragement of agreement trials at intra-school and inter- school level, by the provision of sample materials to indicate common standards and by strengthening the audit role of local moderators. If this ‘quality control within quality assurance’ could be made more robust – and its costs justified in terms of educational improvement and professional development – then it might just be possible to argue against the introduction of predominantly ‘external’ quality control systems whether in the form of OFSTED inspections or the development of reference tests for scaling purposes. Unfortunately the latter will have an appeal, on grounds of simplicity and cost, to many of those outside of education. Not only will these arguments have to be confronted, but politicians, particularly, will need to be made aware that the question of the competence of OFSTED inspectors to judge school assessments in relation to national standards will need to be addressed. Similarly, they should be disabused of any idea that the creation of simple ‘rigorous’ tests that are valid and reliable in the context of the National Curriculum is an easy task. The experience of the past three years suggests that this is not so.

References

Angoff, W. H. (1974) Criterion-referencing, norm-referencing and the SAT, College Board Review, Vol. 92, pp. 2–5.

Black, P., Harlen, W. and Orgee, A. (1984) Standards of Performance: Expectations and Reality, APU occasional paper. DES: London.

Conner, C. and James, M. (1991) Bedfordshire Assessment Training, 1991: An Independent Evaluation. Cambridge Institute of Education.

Conner, C. and James, M. (1993) Assuring Quality in Assessments in Schools in Six LEAs, 1993. Cambridge Institute of Education.

DES/WO (1988a) Task Group on Assessment and Testing: A Report. DES/WO: London.

Daugherty, R. (1994) National Curriculum Assessment: A Review of Policy 1988–93. Falmer Press: London.

Dearing, Sir R. (1993a) Personal communication, 29 July.

Dearing, Sir R. (1993b) The National Curriculum and its Assessment: Interim Report. NCC and SEAC: London and York.

Dearing, Sir R. (1994) The National Curriculum and its Assessment: Final Report. SCAA: London.

HMI (Her Majesty’s Inspectorate) (1994) Assessment, Recording and Reporting: Key Stages 1, 2 and 3: Fourth Year 1992–93. OFSTED: London.

James, M. and Conner, C. (1992) Moderation at Key Stage One across Four LEAs, 1992. University of Cambridge Institute of Education.

James, M. and Conner, C. (1993) Are reliability and validity achievable in National Curriculum assessment? Some observations on moderation at Key Stage 1 in 1992, The Curriculum Journal, Vol. 4, no. 1, pp. 5–19.

SEAC (Schools Examination and Assessment Council) (1991) National Curriculum Assessment. Assessment Arrangements for Core and other Foundation Subjects. A Moderator’s Handbook 1991/2. SEAC: London.

Wiliam, D. (1993) Validity, dependability and reliability in National Curriculum assessment, The Curriculum Journal, Vol. 4, no. 3 (Autumn), pp. 335–50.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset