education: Adding Value to the Value-Added Debate

Seeing as I am not paid to blog as part of my daily job, it's basically impossible for me to be even close to first out of the box on the issues of the day. Add to that being a parent of two small children (my most important job – right up there with being a husband) and that only adds to my sometimes frustration of not being able to weigh in on some of these issues quickly.

That said, here is my attempt to distill some key points and share my opinions -- add value, if you will -- to the debate that is raging as a result of the Los Angeles Times's decision to publish the value-added scores of individual teachers in the L.A. Unified School District.

First of all, let me address the issue at hand. I believe that the LA Times's decision to publish the value-added scores was irresponsible. Given what we know about the unreliability and variability in such scores and the likelihood that consumers of said scores will use them at face value without fully understanding all of the caveats, this was a dish that should have been sent back to the kitchen.

Although the LA Times is not a government or public entity, it does operate in the public sphere. And it has a responsibility as such an actor. Its decision to label LA teachers as 'effective' and 'ineffective' based on suspect value-added data alone is akin to an auditor secretly investigating a firm or agency without an engagement letter and publishing findings that may or may not hold water.

Frankly, I don't care what positive benefits this decision by the LA Times might have engendered. Yes, the district and the teachers union have agreed to begin negotiations on a new evaluation system. Top district officials have said they want at least 30% of a teacher's review to be based on value-added and have wisely said that the majority of the evaluations should depend on classroom observations. Such a development exonerates the LA Times, as some have argued. In my mind, any such benefits are purloined and come at the expense of sticking it -- rightly in some cases, certainly wrongly in others -- to individual teachers who mostly are trying their best.

Oh, I know, I know. It's not about the teachers anymore. Their day has come and gone. "It's about the kids" now, right? But you know what? The decisions we make about how we license, compensate, evaluate and dismiss teachers affects them as individual people, as husbands and wives, as mothers and fathers. It effects who may or may not choose to enter the profession in the coming years. If we mistakenly catch a bunch of teachers in a wrong-headed, value-added dragnet based upon a missionary zeal and 'head in the sand' conviction that numbers don't lie, we will be doing a disservice both to teachers and to the kids. And, if we start slicing and dicing teachers left and right, who exactly will replace them?

(1) Value-added test scores should not be used as the primary means of informing high-stakes decisions, such as tenure and dismissal.
One primary piece of evidence was released just this week from the well-respected, nonpartisan Economic Policy Institute. The EPI report, co-authored by numerous academic experts, said:

Student test scores are not reliable indicators of teacher effectiveness, even with the addition of value-added modeling (VAM).
Though VAM methods have allowed for more sophisticated comparisons of teachers than were possible in the past, they are still inaccurate, so test scores should not dominate the information used by school officials in making high-stakes decisions about the evaluation, discipline and compensation of teachers.
Neither parents nor anyone else should believe that the Los Angeles Times analysis actually identifies which teachers are effective or ineffective in teaching children because the methods are incapable of doing so fairly and accurately.
Analyses of VAM results show that they are often unstable across time, classes and tests; thus, test scores, even with the addition of VAM, are not accurate indicators of teacher effectiveness. Student test scores, even with VAM, cannot fully account for the wide range of factors that influence student learning, particularly the backgrounds of students, school supports and the effects of summer learning loss. As a result, teachers who teach students with the greatest educational needs appear to be less effective than they are.

Other experts, such as Mathematica Policy Research, Rick Hess, and Dan Goldhaber have offered important cautions as well.

The findings of the IES-funded Mathematica report were “largely driven by findings from the literature and new analyses that more than 90 percent of the variation in student gain scores is due to the variation in student-level factors that are not under the control of the teacher. Thus, multiple years of performance data are required to reliably detect a teacher’s true long-run performance signal from the student-level noise…. Type I and II error rates [‘false positives’ and ‘false negatives’] for teacher-level analyses will be about 26 percent if three years of data are used for estimation. In a typical performance measurement system, more than 1 in 4 teachers who are truly average in performance will be erroneously identified for special treatment, and more than 1 in 4 teachers who differ from average performance by 3 months of student learning in math or 4 more in reading will be overlooked. In addition, Type I and II error rates will likely decrease by only about one half (from 26 to 12 percent) using 10 years of data.”

Hess has “three serious problems with what the LAT did. First … I'm increasingly nervous at how casually reading and math value-added calculations are being treated as de facto determinants of "good" teaching…. Second, beyond these kinds of technical considerations, there are structural problems. For instance, in those cases where students receive substantial pull-out instruction or work with a designated reading instructor, LAT-style value-added calculations are going to conflate the impact of the teacher and this other instruction…. Third, there's a profound failure to recognize the difference between responsible management and public transparency.”

Goldhaber, in a Seattle Times op-ed, says that he “support[s] the idea of using value-added methods as one means of judging teacher performance, but strongly oppose[s] making the performance estimates of individual teachers public in this way. First, there are reasons to be concerned that individual value-added estimates may be misleading indicators of true teacher performance. Second, performance estimates that look different from one another on paper may not truly be distinct in a statistically significant sense. Finally, and perhaps most important, I cannot think of a profession in either the public or private sector where individual employee performance estimates are made public in a newspaper.”

Multiple measures to inform teacher evaluation seems like the right approach, including the use of multiple years of value-added student data (one thing the LA Times DID get right). That said, the available research would seem to suggest that states (particularly in Race to the Top) that have proposed basing 50% or more of an individual educators evaluation on a value-added score may have gone too far down the path. LA Unified officials have said (LA Times, 8/30/2010) they want at least 30% of a teacher's review to be based on value-added and that the majority of the evaluations should depend on observations. That might be a more appropriate stance.

(2) Embracing the status quo is unacceptable.
As reports such at The New Teacher Project's Widget Effect have chronicled, current approaches to teacher evaluation are broken. They don’t work for anyone involved. Critics of VAM cannot simply draw a line in the sand and state that, "This will not stand!" If not this, then what? Certainly not the current system! Fortunately, efforts led by organizations such as the American Federation of Teachers and the Hope Street Group are developing or have offered thoughtful solutions to this issue. [Disclosure: I participated in Hope Street's effort and my New Teacher Center colleague Eric Hirsch serve on AFT’s evaluation committee.] Sadly, LA Unified and the LA Teachers Union both are culpable –along with the LA Times – in bringing this upon the city's teachers by refusing to act to analyze or utilize available value-added data. An adherence to the status quo created a void that the LA Times sought to fill ~~in order to sell more newspapers~~ in a wrong-headed attempt to inform the public.

(3) The ‘lesser of two evils’ axiom should not be invoked.
Even if you agree that all the factors we currently use to select and sort teachers is worse than a value added only alternative, as argued by Education Sector's Chad Aldeman, our current arsenal does not meaningfully inform high-stakes decisions (apart from entry tests with largely low passing scores and the aforementioned impossible-to-fail evaluations). That's, of course, both a condemnation of the current system's inability and/or unwillingness to differentiate between teachers, but it's also a recognition that we haven't struck the right balance or developed the value-added systems to inform high-stakes decisions in this regard in all but a few promising places.

(4) Don't lose sight of the utility of value-added data to inform formative assessment of teaching practice.
If one of the takeaways from research is that value-added data shouldn't be used to drive high-stakes decisions, it is helpful to think about the use of this data to inform teacher development. Analysis of student work, including relevant test scores, is an important professional development opportunity that all teachers, especially new ones, should have regular opportunities to engage in. Systems such as the NTC’s Formative Assessment System provide such a tool in states and districts with whom it works on teacher induction. Sadly, this is not the norm in American schools, but is built into high-quality professional development approaches, as Sara Mead wisely discusses in her recent Ed Week blog post. As I noted under #2, LA Unified missed an opportunity to embrace such data to inform its educators in such a way. In the LA Times value added series, several teachers bemoaned the fact that they had never had the opportunity to see such data until it was published in the newspaper.

(5) Valid and reliable classroom observation conducted by trained evaluators is critical.
Other elements of an evaluation system are even more important than value-added methodology if for no other reason that the majority of teachers do not teach tested subjects. Unless we, God forbid, develop multiple-choice assessments of more and more subjects and grade levels, we're going to need valid and reliable ways of assessing the practice of educators who cannot be assessed by value-added student achievement scores. Despite some of the criticisms lobbed at the District of Columbia's new IMPACT evaluation system, this is an element at the heart of DC’s approach to teacher evaluation. Further, the Gates Foundation’s on-going teacher effectiveness study holds great promise.

(6) We've got to get beyond this focus on the 'best' and 'worst' teachers.
How about we focus on strengthening the effectiveness of the 80-90% of teachers in the middle? We know how to do that through comprehensive new teacher induction and high-quality professional development, but we're just lacking the collective will to pull it off and invest in what makes a difference. These are similar roadblocks to what has prevented the use of student outcomes from being considered in teacher evaluations. It raises discomfort, requires a change in prevailing (often mediocre) practices, demands greater accountability, and necessitates viewing teaching not as a private activity but as a collective endeavor. But I keep making this point over and over again about the importance of a teacher development focus within the teacher effectiveness conversation because I see too few reform advocates taking it seriously. Take off the blinders, folks. It is not primarily about firing teachers.

(7) Teacher effectiveness is contextual.
Teaching and learning conditions impact an individual educator’s ability to succeed. It is entirely possible that an individual teacher's value-added score is significantly determined by the teaching and learning conditions (supportive leadership, opportunities to collaborate, classroom resources) present at their school site than about their individual knowledge, skills and practices. In Seinfeldian terms, teachers are not 'masters of their domain' necessarily. The EPI report makes this point. So do my New Teacher Center colleagues through statewide teaching and learning conditions surveys. So does Duke University economist Helen Ladd (also a co-signed on the EPI report) and the University of Toronto’s Kenneth Leithwood.