Does Testing Make the Grade?

Even with broad public interest in raising academic standards and holding schools accountable, there’s a backlash against high-stakes testing.

In public schools across the country, there’s a new skill being taught as early as the primary grades: how to use a number-two pencil to fill in the “bubble form,” an answer sheet for multiple-choice tests that can be instantly scanned and scored by computer. In North Carolina, some second-graders have been practicing this task and other test-taking strategies in anticipation of the upcoming school year, when all third-graders in the state will take a single, multiple-choice, end-of-grade (EOG) test that will likely determine whether they are promoted to the fourth grade or held back a year.

At a time when some top colleges are questioning how much weight to give standardized-test scores in admissions decisions, standardized testing for K-12 students is increasing in nationwide influence. A study conducted by Jay P. Heubert M.A.T. ’74, associate professor of education and adjunct professor of law at Columbia University, finds that a rapidly growing number of states are engaged in “high stakes” testing, requiring students to pass standardized tests as a condition of grade-to-grade promotion. In some states, including North Carolina, test scores are also being used to determine teacher bonuses and school rankings. According to The Chronicle of Higher Education, at least twenty-seven states now use standardized tests in promotion and funding decisions, and President Bush has recently proposed to withhold federal funds from schools that repeatedly fail to meet minimum standards.

Standardized testing as a diagnostic measure of overall student knowledge and skill is nothing new, but “high stakes” assessment is a relatively recent and highly controversial phenomenon. While national surveys have repeatedly found that a majority of Americans favor raising academic standards and holding schools accountable for student achievement, a serious backlash against high-stakes testing, particularly in the early grades, has begun to emerge.

In Florida, one group of teachers sent back the end-of-year bonuses they received as a result of their students’ higher test scores, saying that one-time, standardized paper-and-pencil tests are an affront to their daily, professional assessment of individual student achievement. In Michigan, Ohio, and Massachusetts, parents have organized boycotts of state-mandated exams. In Milwaukee, parents, teachers, and others stonewalled a plan to expand standardized testing to kindergarten, first, and second grades. In Durham this May, representatives of at least four different parent groups from across the state spoke out at a hearing designed to alert legislators to their concerns about high-stakes testing. The groups favor fair testing, but reject the use of standardized assessments as the primary indicators of a child’s curriculum mastery. 

Across the nation, critics have argued that testing reduces teacher and student creativity, focuses too much on basic skills rather than higher-order thinking, and confines teachers to “teach to the test” rather than to a more broadly-based curriculum. 

The roots of the current standards movement go back to 1981, when a study commission convened by the Reagan administration issued A Nation at Risk. The report sounded the alarm that “a rising tide of mediocrity” in the nation’s public education system would eventually lead to erosion of the nation’s workforce competitiveness. 

For its part, North Carolina had been measuring school performance by district as early as 1978, but in the early 1990s the state legislature mandated a more rigorous form of statewide testing designed to measure individual school performance. In response, the North Carolina Department of Public Instruction introduced “The New ABC’s of Public Education” in May 1995. The plan requires that all North Carolina public school students take end-of-grade tests that are tied directly to North Carolina’s curriculum or “Standard Course of Study.” (In some states, more general achievement tests not directly linked to curricular objectives and content are administered.) 

After North Carolina and Texas were designated as the two pacesetter states in the advancement of school accountability at the 1999 National Education Summit, both major party candidates referred repeatedly to North Carolina’s accomplishments during the 2000 presidential debates. George W. Bush in particular hailed high-stakes testing as the most effective means to end social, or unearned, promotion and restore confidence in the nation’s public education system. 

Until this year, North Carolina’s ABC’s test scores have been used only to measure overall school performance in reading comprehension and math skills among students in third through eighth grades. Writing test scores administered in the fourth, seventh, and tenth grades are also entered into the complex formula that has been used to measure and reward individual school performance and to determine teacher bonuses. At the high-school level, End of Course (EOC) tests are administered in Algebra I; Algebra II; Biology; Chemistry; Economic, Legal, and Political Systems; English I; English II; Geometry; U.S. History; Physical Science; and Physics.

Today, certified teachers in North Carolina can receive an end-of-year bonus of $1,500 if their school meets the “exemplary” or highest-growth standard in their scores (set by the state at 10 percent above the statewide average growth). Teachers in schools that meet a predetermined “expected growth standard” earn a $750 bonus. Below these two designations are “adequate performing” schools where at least 50 percent of students are performing at grade level, and “low performing” schools that do not meet their goals or have less than 50 percent of students performing at grade level. No teacher rewards are associated with the latter two designations, though the fifteen lowest-performing schools statewide are targeted to receive special assistance from advisory teams deployed by the state.

Does this business incentive model work for teachers? Brett Jones, who has taught at Duke for the last two years in the education department, is now working on a book that considers the unintended consequences of high-stakes testing, based on his study of twenty-three teachers in Durham County. “There is some evidence that teachers are expecting more of their students and are working harder,” he says. But he suspects that down the road, we may see teachers opting to work in higher-performing schools where the bonuses may be less challenging to come by. The bonus system also may make it more difficult to recruit both new and veteran teachers to work in low-performing schools, a subject yet to be investigated fully, says Jones.

North Carolina’s school officials, bolstered by their success with overall school performance measurements, have just raised the stakes another notch. They are now phasing in the use of EOG and EOC scores in individual student promotion decisions.

Beginning with the school year just ended, individual students’ EOG test scores in reading, mathematics, and writing are the “gateway” to student promotion for all fifth-graders in North Carolina. Next year, the third- and eighth-grade gateways will take effect.

At the high-school level, some EOC exams are also now being weighted as a full 25 percent of a student’s year-long grade for the first time. So a student could go into the state exam with a low “C “ average, perform very poorly on the EOC, and actually flunk the course, earning no credit for a year’s work in that subject area. (In prior years, high-school EOC tests were only marginally weighted in a student’s final grade.) Effective with the graduating class of 2004-05, students will also have to pass an exit exam of essential skills to earn a high-school diploma.

Public-instruction officials have repeatedly emphasized that students in grades three, five, and eight who do not pass the EOG the first time may retake the test several days later to determine if the initial score was a fluke or if the student simply had a bad test day. Students who do not pass in the second round, however, must participate in remedial work over the summer and are then retested before the next school year begins. If they do not pass the test on the third attempt, they are held back.

Parents or teachers who believe a student has been unfairly judged may then request a portfolio review of the student’s overall work during the year just ended. Under state policy, this evaluation must be conducted by an independent committee of teachers and a principal from another school, who then recommend pass or fail. Ultimately, the final decision about promotion rests with the principal of the school that the child attends.

David Malone Ph.D.’84, assistant professor of the practice in Duke’s education department, teaches a number of undergraduates who are working toward their Master of Arts in Teaching (M.A.T.). As part of their training, M.A.T. candidates enter local classrooms to practice teaching. In addition, Malone oversees a grant-funded project involving some 100 Duke undergraduates—many of whom are also students in his educational psychology class. These students tutor fourth- and fifth-grade students from Durham twice a week for an hour helping to prepare them for the EOG exams.

Young student balancing books on his head

“Now that the gateway ABC’s are finally here,” Malone says, “people in North Carolina are asking a lot more questions about each individual school’s resources, the inequities we have in early childhood opportunities for individual students, and what happens when a class loses a teacher during the year and has a disproportionate number of days with substitute teachers who are not familiar with test preparation. With so many of my undergraduates witnessing the system firsthand, it seems like testing is all we ever talk about around here.”

Malone had his students take a North Carolina benchmark or practice exam designed to test fifth-grade reading comprehension. “Out of my forty undergraduates in Education 118,” he says, “only half got all the questions right. Their critical-thinking skills were too sophisticated for some of the questions. Taking this test quickly demonstrated to them that it is very difficult, even for a team of experts, to construct a fair assessment of a kid’s reading ability with a multiple-choice test.” Moreover, says Malone, new test questions and multiple versions of the tests must be constantly developed and field-tested. To maintain confidentiality of content and to avoid cheating, parents and teachers alike are not permitted to view the EOG and EOC multiple-choice tests, either before or after their administration.

“Not only was the reading comprehension test we took culturally biased, in my opinion, and highly ambiguous in places,” says Malone student Dennis Davis, a Duke senior from Greensboro, North Carolina, “but it was clear to me that on one particular question, I could imagine how the fifth-grade kid I tutored would completely project his own dysfunctional family experience into the answer. There’s no way he would give the ‘correct’ answer. Heck, I didn’t get it right, either!” The question to which Davis refers asked students to judge what might be an absent father’s response to a family disaster that took place in Colonial America.

“Non-instructional factors account for a significant amount of the variance among EOG test scores when schools or districts are compared,” says Steven Pfeiffer, adjunct professor of psychology and education and executive director of Duke’s Talent Identification Program for gifted middle- and high-school students. “Factors such as parents’ educational background, type of community, and poverty level account for more than 50 percent of the difference in test scores.” According to Pfeiffer, “an appropriate use for this kind of testing” is “for screening purposes to determine if schools or groups of students may need additional support and in what specific areas.”

But what happens when tests become a measure of individual student performance for promotion decisions, as they have now in North Carolina?

Early in the school year, the North Carolina School Psychology Association issued a position statement arguing strongly against the use of EOG tests in individual promotion decisions because, they revealed, the test itself has not been statistically validated as an accurate measure of individual student performance. The test has only been validated as a screening tool for students in the aggregate, along the lines that Pfeiffer says are useful.

Using tests not validated for the measurement of an individual student’s performance is a common problem nationwide, Pfeiffer explains. “The tests simply lack a level of precision—an acceptable level of scientific or technical rigor—necessary for making decisions on individuals. Norm-referenced EOG tests were not developed and were never intended to measure the quality of learning or instruction. And decisions that affect a student’s life or educational opportunities should never be made on the basis of a single test score, no matter how reliable or valid. To ensure fairness, students should have multiple opportunities to display their skill or competence, particularly on decisions that carry serious consequences such as promotion and graduation.” That North Carolina allows students to take the EOG test multiple times is not the same as multiple measures of performance, Pfeiffer says. 

Marvin Pittman is the assistant superintendent of North Carolina Schools with the special charge of helping North Carolina close the much-discussed “achievement gap” between poor and minority students and their more affluent, generally white, counterparts. “My goal is not to make anyone love this [testing] policy,” Pittman says. “We know that a single measure can’t do it, and we’re looking at other ways to measure performance. The State Board of Education understands that you must look at other areas. Portfolio assessment [evaluating a range of examples of students’ work] may be the way to go, but we’re not there yet. As we have looked at the research, we really don’t see anyone doing this very well. In the interim, we are using the EOG. The overriding part of this policy is identifying where we need to do extensive interventions to help create better-performing students and schools.” 

Joe Johnson ’70, M.A.T. ’71, Ed.D. ’78 is superintendent of the Wilkes County schools, in the foothills of North Carolina. “Right now, we can honestly say that our students are reading, doing math, and writing better than they ever have,” he says. “Because of the tests, the dialogue between teachers, students, and home has increased. We are now obliged to give parents more information. One downside of the process, however, is that administering the test falls to the guidance counselors, which means they have less time to attend to individual needs of students, their academic concerns, and any problems that might be happening at home. At a time when you are increasing the awareness of individual children and the importance of learning, you are, ironically, removing one of the people who should be most involved. Testing takes enormous effort and resources, and we need more state resources for administrative help with the tests.” 

In North Carolina, early in the school year the State Department of Public Instruction did ask schools to identify students they expected to have trouble passing the EOG. They obtained an additional $31 million from the state legislature to reduce class sizes and hire more teachers in low-performing schools, and to provide tutoring and special Saturday classes to assist at-risk students. For the upcoming school year, says Pittman, they have requested an additional $39 million for the same purpose.

Still, the North Carolina School Psychology Association argues that while “retention with extensive remediation has been effective with certain groups of children, promotion with similar remediation is more effective and has fewer negative effects.” 

“The single strongest predictor of whether students will drop out of school is whether they have been retained in grade,” writes Columbia’s Jay Heubert, citing several recent studies. “Those retained in grade even once are much likelier to drop out later than are students not retained, and the effects are even greater for students retained more than once. Moreover, much of the increase in dropout rates show up only years later, and the harm is thus largely invisible at the time the retention occurs.”

Joseph DiBona, an associate professor of education at Duke, puts it this way: “Those kids who fail are going to need lots more help to pass the tests on the third try or after repeating a grade, which will be very costly. We’re in a terrible box. We have let people pass through the system for years, but after the testing is completely phased in at the lower grades in North Carolina, we may see these same kids, when they get to high school, simply drop out in frustration.” 

Fred Jones ’81 is the assistant principal at Jordan High School in Durham. “This is a very costly policy in terms of having to serve any given student for an extra year or more,” he says. “And at the secondary level, we are nervous that we may begin to see students who have been retained two or three times, and may finally be entering high school at age sixteen or seventeen. State law prohibits our serving any student over twenty-one, so we may have students coming into ninth grade who are simply not eligible to graduate unless the law is changed.” 

According to the North Carolina School Psychology Association, the cost of retaining 60,000 students in grades K-12 each year in North Carolina has previously been in the neighborhood of $360 million. Increasing the education budget much further may be prohibitive. North Carolina is already strapped for funds—in part from the widespread devastation in eastern North Carolina created in 1999 by Hurricane Floyd, the continuing erosion of the state’s tax base due to the decline in tobacco sales, and the wholesale movement offshore of much of the state’s textile and furniture manufacturing. How costly it will be to make good on remediation promises and support the expansion of testing to grades three and eight next year is unclear. At the same time, if the tests do not actually increase intervention, and in some cases, retention, then how can the new system be considered more rigorous? 

The possibility that even more students could be held back a grade appeared to be on the minds of some North Carolina lawmakers when, in April, a month before this year’s testing began, some legislators were calling for an easing of standards. 

As it turned out, when this year’s fifth-grade scores came in, as many as 98 percent of students passed the fifth-grade EOG in math in some schools. Officials then admitted they had probably set the bar too low after an eleventh-hour revision to the math exam, which did not permit sufficient field-testing. According to the Raleigh News & Observer, on last year’s fifth-grade math test, students had to answer 35 to 63 percent of the questions correctly to pass, depending on the version of the test administered. This year, the passing measure ranged from 28 to 34 percent of correct answers. Says Susan Wynn, principal of Durham’s Lakewood Elementary, “I’m not sure how much of a student’s actual knowledge we were measuring if answering only 28 percent of the questions correctly was passing. We know that students can actually get 25 percent of the questions correct by random chance.”

Last year, approximately 20 percent of all fifth-graders failed the reading test while some 17 percent failed the math test. “We were anticipating that 15 percent of the students would not be able to meet the standard in math,” said Lou Fabrizio, the state official who oversees North Carolina’s testing.

Steve Schewel ’73, Ph.D. ’82, visiting assistant professor in the Hart Leadership Program, part of Duke’s Terry Sanford Institute of Public Policy, chairs the board of a progressive think tank called The Common Sense Foundation, based in Raleigh. Common Sense has been highly critical of the state’s testing system; it has distributed some 10,000 parent handbooks on the consequences of high-stakes testing. With funding from the Z. Smith Reynolds Foundation, it also convened a commission to collect parent and teacher reaction through a series of hearings across the state. Schewel himself has a fifth-grader at Durham’s E. K. Powe Elementary. 

“When state officials say, ‘Our target was a 15 percent failure rate, and too many fifth-graders passed,’ it suggests to me that the state can too easily manipulate the tests for some political purpose,” Schewel says. “Are we aiming for a certain failure rate so that we can recreate the low-pay workforce we have now, identifying the kids who will eventually work at McDonald’s, and tracking them from the third grade on? Is this test just a tool to replicate our social stratification?” 

While Schewel is speaking halfway tongue-in-cheek, he is certain that North Carolina’s agenda for testing is primarily business-driven. He points to the fact that the chair of North Carolina’s State Board of Public Instruction, Phil Kirk, also happens to be the president of North Carolina Citizens for Business and Industry. The state’s largest business group, it calls itself “the state’s chamber of commerce.” 

 “Kids that fail these tests are having their self-confidence destroyed,” says Schewel. “You only have to walk through a school on test day or the few days leading up to it to witness the anxiety associated with EOG.” The Common Sense Foundation has heard from parents about children crying uncontrollably, throwing up on the tests, and physically abusing themselves with number-two pencils.

“The level of stress is way too high for a third- or fifth-grader,” Schewel says, “way beyond what children of that age should experience.” He concedes that test results have confirmed how parental income is correlated to student achievement. “Testing has now put a number on that,” he says, “and everyone is talking about the achievement gap. We can’t ignore those students anymore. But I’d still rather see more teaching and less testing.” 

Young student balancing books on his head

If I could make the tests go away, I would,” says assistant principal Fred Williams. “The whole process denigrates the professionalism of teachers, the majority of whom here at Jordan have an advanced degree; some have Ph.D.s. We have reasonably rigorous licensing procedures. To have a two-hour paper-and-pencil test count more than a teacher’s year-long observation of a student doesn’t seem reasonable.” Nevertheless, Williams says that teachers have teamed up to think more creatively about how to present the curriculum, create mock test questions, develop review formats, and assess student progress in advance of the EOC tests. “Still,” he says, “I know in my field of U.S. history, the coverage of content is much more superficial now. The tests require breadth over depth. There’s no opportunity to take some extended time to explore the New Deal or the civil-rights movement or the beginnings of the women’s movement in the nineteenth century.”

On the other hand, says Ike Thomas ’69, M.A.T. ’70, Ph.D. ’83, principal of Northern High School in Durham, “The EOC keeps that U.S. history teacher from spending nine weeks on the Civil War just because that’s the teacher’s favorite period in history. What we’re trying to do with EOC is create consistency and focus and keep teachers from operating as independent contractors. Still, we need to keep asking, ‘Is it a fair test?’”

Thomas says he has sometimes witnessed wildly different scores from year to year, suggesting that “some years the test is different, not the kids, so comparisons of groups from one year to the next may not always be useful. But the part we find most burdensome are the field tests.” He says, “Every year we are testing for future tests. Our students tested two of four parts of the new exit exam this year.”

Wilkes County superintendent Joe Johnson says he has railed at times against what he sometimes thinks of as “psychometric madness” or an overemphasis on testing. “We have to remember these are children, not numbers, we’re talking about.” Likewise, at the Common Sense Foundation’s hearing on fair testing in Durham in mid-May, parents, teachers, and organizers from at least five counties gave testimony about how much classroom time is given over to practice tests, field tests, and then the actual tests themselves. “As jobs in our society become more automated, so does our school system,” said David Freeman Ph.D. ’01. “Are we treating our kids like machines because that’s the workforce we want?”

Larry Holt, a parent of two children in the Durham public schools, testified on behalf of his daughter, who had reported to him that some of her teachers were not spending a lot of time with students who were struggling with the material. “Teachers hurried to get ready for the tests,” said Holt. “But, as my daughter pointed out, different teachers teach at different rates, and students learn at different rates. I believe testing is taking away from the positive motivating experience that education ought to be. She says she doesn’t mind tests to measure performance, but how much is too much?”

“Students become the unwitting victims of an over-focus on accountability,” says TIP’s Steven Pfeiffer. “They can lose their enthusiasm and passion for learning. Instruction can quickly over-emphasize memorization of facts to the relative neglect of important and enjoyable higher-level skills, such as the critical examination, synthesis, and evaluation of ideas, group problem-solving, transforming ill-defined problems, imagination, and creative discovery.”

Based on his study of twenty-three teachers in Durham County, Duke researcher Brett Jones says teachers are definitely doing things differently in the classroom. “They are focusing on skill and drill and the lower objectives because multiple-choice tests generally do not involve higher-order thinking. We also found that teachers are spending less time in science and social studies in the primary grades since these areas are not tested.”

When I was a Duke student and later, when I started teaching, I thought the importance of my profession was based on the abstract principle that learning is valuable, regardless of its usefulness,” says superintendent Joe Johnson. “Now we’ve moved to the point of view that learning is good for making money. That’s how we have changed socially. That economic focus, which may be good, has nevertheless caused us to lose the notion that learning by itself is worthwhile.”

Principal Ike Thomas has decided to retire from public education this year. “One of the things that continues to concern me,” he says, “is the perception that public education used to be wonderful, and now it isn’t. If we go back fifty years, a lot of kids didn’t get to school at all. We didn’t worry because of manufacturing and farm jobs that were available to them. But today, we’ve made a commitment to educate all of our children. When you set the bar that high, the shortcomings are more evident. Our schools are doing more for more people than we ever have, and we’re not getting credit for it.”

Superintendent Johnson agrees. “One of the most disappointing things in my career has been the decline in the status, the trust, and respect that people have for public educators,” he says. “Sure, some have misbehaved, but in our nation, there has been a general erosion of confidence in government, and public education has been one of the key focuses.”

If testing ultimately fails as a remedy to upgrade our schools or becomes too costly to maintain, do we then face the disintegration of public education in favor of a voucher system and the privatization of schools? 

Howard Machtinger, director of the Teaching Fellows Program at the University of North Carolina at Chapel Hill, speaking at the Durham hearing on fair testing, said, “Right now, I think the present administration doesn’t really believe in a public sector, only the free market. Meanwhile, we are facing a major crisis in recruiting and retaining teachers. All the talk about testing has obscured this unfolding crisis.” 

“What I would like to see at the state and federal level,” says Pfeiffer, “is a group of authorities brought together—experts in curriculum, child development, and measurement—to come up with the best way to promote equity and excellence in public schools. We need more dialogue in the public arena about what should be the goals of public education. I think the violence we are seeing in our schools speaks to the shared responsibility we have in helping kids deal with painful conflicts and emotions in this culture. Accountability is secondary to what we hope public education will provide our future citizens, which includes getting along with others, respecting the environment, strategic thinking, and problem solving instead of this emphasis on traditional nineteenth-century academic skill areas. We simply have not had that debate yet.”

“Ultimately,” says Brett Jones, “I’d say what we have in North Carolina is a good working draft of a testing system. But for the kids in the system right now, it’s not a draft, it’s a final exam. What we’re doing is a little like building the airplane while you fly it. We are focusing on education more in this country, and on student learning, but good teachers were doing that before high-stakes testing began.” 

Eubanks ’76 is a frequent contributor to the magazine

Share your comments

Have an account?

Sign in to comment

No Account?

Email the editor