Jump To Content

LearnHub




The Testing Dilemma

One of the most common problems facing teachers today is testing. Are we teaching to the test or teaching to the standards? How can we use the information provided by the mandated tests? How can we tell what our student’s strengths are from the mandated test results? How can we use the mandated test results to improve the student’s performance? Why are we giving our students’ these mandated tests, if we are not going to be able to use the information gathered to provide better instruction? Why do we have mandated curriculum, if we have standards and good testing? These questions beg us as teachers to want to know more about testing.

Let’s start with the basics. There are two basic test formats- norm-referenced and criterion-referenced testing. The terms norm-referenced and criterion-referenced have to do with how test scores are interpreted. Norm-referenced tests are those associated with the familiar bell-shaped curve, which is referred to in the phrase “grading on a curve.†When using norm-referenced testing, grades or scores are based on a comparison of the test-takers to one another, while making sure that the target group is well defined, and the questions are not bias to any one test-taking group. So, for example, suppose I was taking a 100-point test that alone would determine whether I would be accepted into a prestigious university. I might be excited to learn that I have received a score of 95 percent (i.e., out of 100 answers were correct). But if acceptance into the university were based on a norm-referenced decision that only the top 10 percent of the students would be accepted, then this score alone will not tell me whether I’ll be accepted. If 10 percent or more of the students who took the exam scored 96 percent or better on the test, then in comparison with those students, I would not be admitted to the university.

Let’s take a different exaggerated outcome to clarify the concept of norm referencing and its use to determine a schools performance and teacher salary. Suppose the mean score for your school was 86 percent (The mathematical average of the students’ scores at your school). While there is room for improvement, the mean score would surely be acceptable in most situations, but not necessarily, in a strictly norm-referenced context. Again, the decision would depend on the performance of the students of the other schools. So if by chance your school’s 86 percent mean score turned out to be on the lower end of the bell curve with most other schools’ mean scores being in the mid to high nineties, then your school’s 86 percent mean score would not even be in the middle of the bell curve. If your school’s extra funding and teachers salary was dependent on the school’s mean score using norm referencing, your school would not have received extra funding and its teachers would not have been given a raise, even though their students performed well.

Norm-referenced tests often utilize score reports that are interpreted in terms of percentiles. Percentiles are not the same as percentages or percents. A percentile is a particular place in the distribution of many scores on a norm-referenced test. So if your score report tells you that you scored at the ninety-sixth percentile, this means only that you scored better than 96 percent of the people who took the test. However, the percentile figure itself doesn’t tell you what your actual score was (i.e., being at the ninety-sixth percentile figure itself doesn’t necessarily mean that you got 96 percent of the items right, and vice versa). This is a simple explanation, I have not described how scores on the bell curve are determined but the explanation is accurate.

Norm-referenced test is a common testing approach, but does it give the information teacher need about their students. I think you’ll find that the answer is no. Norm referenced scores do not tell you how many questions a student scored correctly. It doesn’t tell you what categories the student does well in. It does not give you an idea of what goals or objectives the student has shown success. So what is the alternative: Let’s consider criterion-referenced testing. If norm-reference testing means that a score is interpreted against other test-takers’ scores, then what does criterion-referenced testing mean? You have probably discerned that in criterion-referenced testing, a given score is interpreted relative to pre-set goal or objective (the criterion), rather than to the performances of other test-takers. Criterion referenced tests also have to define the target population and make sure the questions are not bias to any given sub-group of test-takers.

A clear example of criterion-referenced test is the California driver’s license test. This exam involves both a paper-pencil test about traffic laws and a behind-the-wheel test of actual driving skills. A passing score level has been pre-set by the California Department of Motor Vehicles, based on professional judgments regarding the knowledge and skill required to be a safe driver. A set grading system has been established and graders have gone through a training system to make sure that each grader uses the same criteria in the same way. Using the criterion-referencing score interpretation system, a test-taker’s performance is judged against the criterion -- not against the performance of other test-takers.

For years, testing (especially in the United States) has been dominated by the norm-referenced approach to interpret scores. In my opinion, the norm-referenced approach is seldom the appropriate way of determining students’ abilities and the information gathered does not give parents, teachers, administrators or government officials the information they need to determine if a student is performing at a particular level, if a teaching is effective, and/or if a school/school district produces students’ whose perform meet grade level.

Norm-referenced testing was developed as a selection tool. Thus, norm-referenced tests cover a myriad of information within each subject or category area. This grading system always have scores that are distributed evenly on the ‘bell curve.†Since there is always an evenly distributed curve, there is always scores at the top end, mid and bottom level. In contrast Criterion-referenced tests are based on performance demonstrated against preset performance goals and objectives, a set grading process and set performance levels. Thus, there is no need for evenly distributed scores. We can have most scores within the pre-set performance criteria.

It seems to me, that as teaching professionals, we need to require that the correct type of test is given to our students; that the results provided by the test meet the need of those who are using the results; that the information gathered is usable and lastly that we the teacher, get to use the usable information to enhance our students’ performances.

Chris Babowal
babceo@sbcglobal.net
Skype:babceo

Your Comment
Textile is Enabled (View Reference)