Menu
Paid Advertisement
view counter

Combined measures better at gauging teacher effectiveness, study finds

This is a reprint of an article that originally appeared in Education Week.


by Stephen Sawchuk

Student feedback, test-score growth calculations, and observations of practice appear to pick up different but complementary information that, combined, can provide a balanced and accurate picture of teacher performance, according to research released by the Bill & Melinda Gates Foundation.

A composite measure on teacher effectiveness drawing on all three of those measures, and tested through a random-assignment experiment, predicted fairly accurately how much high-performing teachers would successfully boost their students’ standardized-test scores, concludes the series of new papers, part of the massive Measures of Effective Teaching study launched three years ago.

“If you select the right measures, you can provide teachers with an honest assessment of where they stand in their practice that, hopefully, will serve as the launching point for their development,” said Thomas J. Kane, a professor of education and economics at the Harvard Graduate School of Education, who headed the study.

Basing more than half a teacher’s evaluation on test-score-based measures of student achievement seemed to compromise it, the researchers also found.

Another piece suggests that teachers should be observed by more than one person to ensure that observations are reliable.

The findings are among dozens from the final work products of MET. Together, they are billed as a proof point for the three measures the foundation has spent years studying.

Even as they praised the project’s other insights, some scholars debated the strength of the findings from the random experiment. One glitch: Teachers and administrators didn't always comply with the randomization component, making it harder to interpret the findings.

“We can only be certain that it’s a valid predictor of future test scores for those teachers who complied with the assignments,” said Jonah E. Rockoff, an associate professor of finance and economics at Columbia Business School, who has studied teacher-quality issues using economic techniques. Mr. Rockoff was not involved in the study, but reviewed early drafts of the randomization.

Taken as a whole, the final MET findings provide much food for thought about how teacher evaluations might best be structured. But they are not likely to end a contentious, noisy debate about evaluation systems, and they are almost certain to be intensely debated, in part because of Gates’ separate support for advocacy organizations that have already staked out positions on teacher evaluations.

(The Gates Foundation also provides supports for coverage of business and innovation in Education Week.)

Weighing measures

The $45 million study, in progress since 2009, is one of the largest and most extensive research projects ever undertaken on the question of how to identify and measure high-quality teaching. It involved some 3,000 teachers in six districts: Charlotte-Mecklenberg, N.C.; Dallas; Denver; Hillsboro County, Fla.; Memphis, Tenn.; and New York City.

Earlier studies released by the MET project had examined three potential measures of teacher quality: observations of teachers keyed to teaching frameworks, surveys of students’ perceptions of their teachers, and a value-added method, which attempts to isolate teachers’ contributions to their students’ academic achievement. Researchers examined the relationship of each measure to students’ scores on state standardized tests; to their scores on a more complex, project-based series of tasks; and to their perceptions of their teachers’ instructional strengths and weaknesses.

Each of those measures, the earlier papers stated, had positive and negative traits; some were more reliable over time but less predictive of how much teachers would improve their students’ achievement.

One of the four new papers examines different ways of weighting those three measures. It found that those that relied the most heavily on state standardized-test scores appeared to be counterproductive. Those composites tended to be volatile and were also the least predictive of how students taught by those teachers would fare on the more cognitively challenging tasks.

Yet weighting schemes that put the most emphasis on teacher observations were the least predictive of gains on the state test scores, it says.

In all, the study says, those that use a more equal mix of components, including between a third and half based on value-added, couple better correlations to the outcome measures with improved reliability.

In a way, the findings indicate that there is no one “best” way to weight the measures; instead, that decision will depend on what policymakers most value, whether state test scores or other outcomes.

Randomization

From the beginning, one of the foundation’s key goals was to subject promising measures to “validation” through a randomized experiment.

Though infrequently conducted in K-12 education because of logistical problems and expense, random assignment allows researchers to eliminate sources of bias not accounted for using traditional statistical techniques.

The Gates project, with its reach across six districts and thousands of teachers, offered an unusual chance to test the ideas at a scale not seen previously.

For the randomization, researchers in 2009-10 generated estimates of teachers’ performance based on composite measures using data from the surveys, prior test scores, and observation scores. Within individual schools, the study randomly assigned a class of students to each of the participating teachers in particular grades and subjects. After a year, then, researchers compared those teachers’ actual performance to the estimates.

The results were examined in groups based on the teachers’ predicted performance.

In general, the groups of teachers identified as being more effective were in fact so in reality and produced results on par with what the measures had predicted. They also improved student performance not just on traditional standardized tests but also on the deeper, project-based tasks.

“Because of the random assignment, we can be confident that we identified a subgroup of teachers who caused achievement to happen,” Harvard’s Mr. Kane said. “It’s sort of a big deal to be able to say that.”

Student attrition and other factors, including the refusal of several schools to carry out the randomization despite agreeing to do so, led to relatively high rates of noncompliance. About 66 percent of students in Dallas stayed with their assigned teacher, but only 27 percent of students in Memphis did.

To account for the noncompliance, researchers used a statistical technique known as “instrumental variables” to adjust the results. The technique is widely used in the social sciences.

Scholars had different opinions about how far those findings could be extrapolated to the K-12 field at large.

“These results could still be based on a very selective group of teachers,” said Jesse M. Rothstein, an assistant professor of economics at the University of California, Berkeley, who has often been critical of the MET findings. “I would love to see a lot more investigating of just who was and wasn’t complying, and why they were left out.”

Douglas N. Harris, a professor of economics at Tulane University, in New Orleans, added that the study didn’t address some other potential sources of bias. For example, it’s possible that bias in the value-added estimates for each individual teacher might have been averaged out in the group estimates. (The averaging was done in order to obtain a sufficient sample size, a limit of the random-assignment method.) But most school districts and states using value-added approaches are using individual, not group-level results, he noted.

The study’s authors also acknowledge that the experiment is limited to comparisons of teachers within, but not across, schools.

“There are a lot of ways in which there could be a nonrandom assignment of students to teachers,” Mr. Harris said. “They’re studying some elements of that, but not others.”

Teacher observations

In yet another new finding, the researchers dug deeper into observations of teachers. Using a subset of 67 teachers in the Hillsboro, Fla., district, they investigated ways to improve the consistency of the scoring of their lessons, including by using more frequent, shorter observations and multiple raters.

The researchers found that having different raters score observations of teachers’ practice may be a key component for the observations systems. Raters’ first perception of a teacher's practice tended to influence how they scored additional lessons taught by that same teacher, the study found.

Nearly all teachers scored in the middle categories on the framework studied, the four-tiered Framework for Teaching, a popular tool created in 1996 by consultant Charlotte Danielson, rather than at the top or bottom ones. The researchers struggled to interpret that finding.

“It could be that observers are simply uncomfortable making absolute distinctions between teachers,” that paper says. “It could be that the performance-level standards need to make finer distinctions. Or it could simply be that underlying practice on the existing scales does not vary that much.”

Mixed reception?

Nearly every work product released by the MET researchers thus far has been contested to some degree by observers, and the most recent results are likely to be no exception.

“They see this as proof that the more equally weighted, combined measure is superior, but they omit all discussion of the expense and difficulty of collecting the classroom observations and student surveys,” said Jay P. Greene, a professor of education policy at the University of Arkansas. Mr. Greene contends that earlier reports from Gates have veered too far into advocacy.

By contrast, the American Federation of Teachers, whose leader has had an on-again-off-again rapport with Mr. Gates and with the MET project, seemed to embrace the final studies.

“The MET findings reinforce the importance of evaluating teachers based on a balance of multiple measures of teaching effectiveness, in contrast to the limitations of focusing on student test scores, value-added scores, or any other single measure,” AFT President Randi Weingarten said in a statement.

view counter

Comments (7)

Submitted by Education Grad Student (not verified) on January 10, 2013 2:25 pm
The methodology of the Measures of Effective Teaching study are flawed. Read the review here at the National Education Policy Center: http://nepc.colorado.edu/think-tank/bunkum-awards/2011 and http://nepc.colorado.edu/thinktank/review-learning-about-teaching.
Submitted by Education Grad Student (not verified) on January 10, 2013 2:05 pm
The best literature about what is the best way to measure effective teaching will be in a peer-reviewed journal, such as one that the American Educational Research Association publishes.
Submitted by Lawrence A. Feinberg (not verified) on January 10, 2013 3:53 pm
Here’s a critique of the above Gates’ findings…. The 50 million dollar lie Gary Rubinstein's Blog JANUARY 9, 2013 Last year I spent a lot of time making scatter plots of the released New York City teacher data reports to demonstrate how unreliable value-added measurements are. Over a series of six posts which you can read here I showed that the same teacher can get completely different value-added rankings in two consecutive years, in the same year with two different subjects, and in the same year with the same subject, but in two different grades. Read more: http://garyrubinstein.teachforus.org/2013/01/09/the-50-million-dollar-lie/
Submitted by bye bye (not verified) on January 10, 2013 10:10 pm
The number of effective teachers in a given school or district shall be inversely proportional to the frequency with which teachers in that school or district are pestered by clueless suits attempting to apply some guru's scale to quantify their effectiveness. And again, we are suffering from the delusion that there are droves of "effective" people itching to fill schools with their overabundance of "effectiveness," while unfortunately kept at bay by the "ineffective" people currently inhabiting the premises. Guess what? Right now, we have the most able bodies that are willing. If we generate facts, figures, scales, and such in order to send them away, we will be rewarded with less able, less willing bodies to fill the gaps we create. Why not try something new? Create more desirable working conditions (one start would be fewer clueless suits attempting...). Increase demand for the job. Then maybe, the "effective" people who have NOT been waiting in the wings will come rushing to fill the gaps that we have every year. Stop driving the best people we've got away and start ATTRACTING the hypothetical "better people."
Submitted by Anonymous (not verified) on January 13, 2013 12:17 pm
So the Gates study finds that all the methods he has been pushing for years, in particular use of standardized tests to measure teacher effectiveness, are best. Who knew? Lisa Haver
Submitted by tom-104 on January 13, 2013 3:29 pm
Jersey Jazzman has an interesting comment about this: Our Nation's Goofy Education Debate http://tinyurl.com/a2yzzbn
Submitted by Anonymous (not verified) on June 5, 2014 11:45 am
Education important for us and paper writers official site help here. Thanks for share.

Post new comment

The content of this field is kept private and will not be shown publicly.

By using this service you agree not to post material that is obscene, harassing, defamatory, or otherwise objectionable. We reserve the right to delete or remove any material deemed to be in violation of this rule, and to ban anyone who violates this rule. Please see our "Terms of Usage" for more detail concerning your obligations as a user of this service. Reader comments are limited to 500 words. You are fully responsible for the content that you post.

Follow Us On

Read the latest print issue

 

Philly Ed Feed

Become a Notebook member

 

Recent Comments

Top

Public School Notebook

699 Ranstead St.
Third Floor
Philadelphia, PA 19106
Phone: (215) 839-0082
Fax: (215) 238-2300
notebook@thenotebook.org

© Copyright 2013 The Philadelphia Public School Notebook. All Rights Reserved.
Terms of Usage and Privacy Policy