Read Berwyn C.A.R.E.S. Blog: February 2013

Today's guest blogger is Gloria Mitchell.

Gloria Mitchell is a BerwynCARES board member and parent of two girls. She is pursuing a teaching degree and is a regular contributor to this blog.

A friend of mine recently sent me a link to an online petition to the White House to end high-stakes testing in the American education system. I signed it, and I’d like to share why, in case readers of this blog would like to sign, too.

First, let me say that I didn’t sign it because I think all standardized tests are worthless. As an involved parent and now a preservice teacher, I know that there are worthwhile assessment tests that can help parents and teachers evaluate a child’s skills in order to determine the level and style of instruction that will help him or her learn best.

I didn’t sign it because I think bad teachers should be allowed to keep their jobs. I believe our country needs more good teachers: more intelligent, well-informed, well-educated, creative and caring people who work to bring out the best in every child. That’s why I want to be a teacher and am committed to becoming an excellent one.

I didn’t sign it because I think evidence is unimportant in evaluating the work of schools. Evidence is tremendously important.

But we need to know that the data schools gather and share is based on valid tests, and just as importantly, we need to ensure that people who use the data are making valid interpretations and therefore worthwhile policy recommendations. These recommendations can have real impact on thousands or millions of teachers, children, and families.

With that in mind:

We should be careful not to use achievement test data to make decisions based on interpretations that are not valid for what the tests measure. A fourth-grade reading test measures whether a student reads at a fourth-grade level. It is not designed to measure whether the student’s teacher is good or whether the school he or she attends is good. But administrators and policymakers would like to have some objective means of evaluating teachers and schools, and because achievement tests are scored objectively, it is tempting to try to convert the achievement scores for students into “effectiveness” scores for teachers and schools.

Some researchers have attempted to construct methods of measuring a teacher’s “value-added assessment,” which would be how his or her students perform on achievement tests compared with how the same students would perform with a hypothetical average teacher. This sounds useful, but different methods of calculating “value-added” can lead to different results, and any of the methods may be prone to errors, as can the tests themselves. The Seattle teachers who are currently protesting their district’s use of MAP tests pointed out that the margin of error on the test was in some cases greater than the achievement gains their students were supposed to show.

For more on this, high school math teacher Gary Rubinstein has some incisive things to say about the use of achievement test scores as school/teacher evaluations on his blog.

We should keep in mind that we want and need outcomes from education that are not assessed on achievement tests. What do we overlook when achievement test results are used as a stand-in for educational outcomes in general? A multiple-choice test is by nature a test of convergent thinking (choose the right answer from a given set). Creativity relies on divergent thinking (generate new possibilities or many answers). It is interesting to note that while scores of American schoolchildren have gone up on the National Assessment of Educational Progress, they have gone down on the Torrance test of creativity. Since creativity, as we are beginning to notice, drives innovation and thereby economic growth, this may give us cause for concern.

More broadly, we might ask whether educational testing distorts the process it is designed to measure, namely teaching and learning. If we create high-stakes assessments of one range of subjects and skills, do we in effect force schools and teachers to reduce or eliminate time for untested subjects (history, geography, fine and applied arts, music, foreign language, physical education) and untested skills (social skills, self-regulatory skills, public speaking, creative problem-solving) in favor of the tested ones? Do we force teachers to spend more time on test-taking skills, and if so, what are we crowding out in order to make that time? What about the time spent administering the tests?

We should remember that just because a test can be scored objectively does not mean the test itself is objective. Educational research attempts to describe and to quantify hypothetical constructs. A machine may be able to tell us whether a student picked answer A, B, C, or D, but human beings are the ones who must decide whether a given test adequately measures intangibles like “comprehension,” “analysis,” and so on.

This is of particular concern when researchers are trying to design one test that will be valid for many populations. One example of the questionable validity of reading comprehension tests comes from a researcher at the University of Wisconsin. She found that a group of boys who performed at or below grade level when tested on a passage from a grade-level social studies text, all performed at or above grade level when tested on an article about video games, even though it was 3 to 6 levels beyond their purported reading levels (much harder than the social studies text). Did their reading comprehension skills improve overnight? Not exactly. The students’ performance on the second test was analyzed and found to be primarily a result of increased persistence on the part of the readers: they “self-corrected” much more frequently when presented with text on a topic of interest to them.

Researchers in the social sciences use the term construct-irrelevant variance to describe what happens when results are affected by factors other than the performance ability of the test subject. Construct-irrelevant variance reduces test validity and undermines the usefulness of test results. Educational testing is rife with it, from math tests that inevitably test English reading skills, to test formats that do not work equally well for all learners, to test questions that reflect cultural biases.

Should kids be tested in school? Yes, to the extent that the tests are reliable, valid, and the results are used for valid purposes.

Should standardized test results be used to evaluate schools?

As I write this, my daughter is at school taking a standardized test. She has been promised that she may attend a pizza party tomorrow if her scores are higher now than they were in the fall. Her scores in fall were quite high, and she rightly perceives that it may be hard for her to improve on them. She’s nervous.

Who or what is being evaluated here?

What’s wrong with this picture?

Sunday, February 10, 2013

Standardized Testing: Is it ALL bad?

BLOG ARCHIVE

SPONSORS