by Liana Heitin for Education Week
U.S. performance in reading, math, and science has remained stagnant since 2009 as other nations have plowed ahead, according to new results from a prominent international assessment.
Nineteen countries and education systems scored higher than the United States in reading on the 2012 Program for International Student Assessment, or PISA, up from nine systems when the test was last administered in 2009. Germany and Poland, for instance, have seen steady gains on the reading assessment over time and are now ahead of the United States.
In mathematics, 29 nations and other jurisdictions outperformed the United States by a statistically significant margin, up from 23 on the 2009 test, the results released Tuesday show. The nations that eclipsed the U.S. average included not only traditional high fliers like South Korea and Singapore, but also Austria, the United Kingdom, and Vietnam.
In science, 22 education systems scored above the U.S. average, up from 18 in 2009.
“While we’re standing still, other countries are making progress,” said Jack Buckley, the commissioner of the National Center for Education Statistics, which issued the U.S. report on PISA.
The global assessment compares reading, math, and science “literacy”—or knowledge and application of skills—among 15-year-olds internationally. For the first time, the report also includes separately reported results for public school students in three American states: Connecticut, Florida, and Massachusetts.
Massachusetts, long a top-performing U.S. state, demonstrated especially strong performance on the global stage: It scored better than the average for leading industrialized nations in all subjects.
Mitchell D. Chester, the education commissioner for Massachusetts, said the new PISA data “helped reinforce that our students are performing among some of the better-performing nations in the world, and it also made clear to me that we shouldn’t be complacent.”
Shanghai tops rankings
Among the 65 participating education systems, the highest performer in all three subjects was Shanghai, though the methodology that treats the Chinese city as a stand-alone system has raised eyebrows.
Overall, U.S. performance in reading and science was on par, as it was three years ago, with the average for the 34 industrialized nations in the Organization for Economic Cooperation and Development. And once again, U.S. scores were below the OECD average in math.
“It’s a policy question whether one should be OK with average,” Mr. Buckley said. “I’d be more willing to tolerate our position if I saw that we were improving.”
The United States continued to have its strongest showing in reading, though there was no measurable change from its 2009 scores. On the PISA scale of 1 to 1,000, the nation scored 498 in reading, statistically similar to the OECD average of 496 and well below Shanghai’s 570.
Massachusetts scored 527 in reading, outperforming all but three education systems. Connecticut came in just behind its neighbor state. Florida’s score was not statistically different from the U.S. average.
While Americans’ reading scores were flat, 10 education systems have surpassed the United States in the subject since 2009, including Ireland, Chinese Taipei (Taiwan), Poland, Estonia, the Netherlands, and Germany.
The 2012 reading results seem “particularly dramatic,” Mr. Buckley said, because several countries that were tied with the United States in 2009 made just enough improvement to statistically edge ahead.
Countries that have demonstrated a “strong multiyear history of sustained improvement” include Germany and Poland, he noted. “That’s a very different trajectory than the U.S. has observed,” Mr. Buckley said.
In math, the United States scored 481, measurably lower than the OECD average. Poland, Vietnam, Austria, Ireland, the United Kingdom, Latvia, and Luxembourg all overtook the United States by statistically significant margins in the math standings for 2012, while Norway dropped behind.
‘Room for improvement’
In Massachusetts, about one in five students were “top performers” in math, scoring at levels 5 and 6 (on a scale with six levels of performance). The same proportion scored below level 2, or the “baseline proficiency” level. By comparison, more than half of Shanghai 15-year-olds scored at the top two levels in math and just 4 percent scored at the bottom level.
“One of the things that concerns me is the gap between our top and bottom performers,” said Mr. Chester of Massachusetts. “While our aggregate results are very strong, there’s much room for improvement in bringing up our scores in the bottom.”
In science, the average for U.S. students was statistically similar to the OECD average and not measurably different from the 2009 results. Massachusetts and Connecticut both scored higher than the United States as a whole, while Florida scored lower.
Some of the most-anticipated results among policymakers are those from Finland, which became a darling of the education policy world after posting strong results on PISA in 2003. Recent results on the Trends in Mathematics and Science Study, or TIMSS, another international exam, have called Finland’s reputation into question. In math, for example, the performance of Finland’s 8th graders on the 2011 TIMSS was not measurably different from that of their counterparts in the United States and trailed several U.S. states that participated.
On the 2012 PISA, Finland scored above the U.S. and OECD averages in all three subjects, but its raw scores were all down from 2009, with the biggest drop in math. Finland ranked sixth among OECD countries in math for 2012. Three years earlier, it was among the top three math performers.
In discussing the results for Shanghai, the top performer on PISA, several experts offered the caveat that its results are not representative of China as a whole. Tom Loveless, a senior fellow at the Brookings Institution’s Brown Center for Education Policy, recently wrote in a blog post that “Shanghai has an economically and culturally elite population with systems in place to make sure that students who may perform poorly are not allowed into public schools.”
Twelve provinces in China took the 2012 PISA test, the OECD confirmed, but only the results from Shanghai, Hong Kong, and Macao were publicly released.
Mr. Loveless was especially critical of that action and suggested in an interview that the OECD “cut a special deal” with the Chinese government, allowing for “cherry-picked” results. In 2011, a Chinese website leaked the average PISA scores from 2009 for all 12 participating provinces. According to those results, China scored measurably above the United States in math and science, but significantly below the U.S. average in reading.
Mr. Buckley of the NCES said that juxtaposing results in Shanghai and Massachusetts—a top-performing U.S. state by most measures—is “a better comparison than Shanghai to the U.S.” In all three subjects tested, Massachusetts’ scores fell far behind those of the Chinese city.
“The Shanghai results suggest that even better things are possible for Massachusetts,” said Mr. Chester.
The OECD report also seeks to offer some insights concerning the impact of poverty and other socioeconomic factors on student achievement. It finds that the extent to which socioeconomic status predicts student performance in the United States is similar to the average for OECD nations. But the report identifies some countries in which achievement is not as closely tied to such factors, including Hong Kong, Estonia, and Japan.
“The large differences between countries/economies in the extent to which socio-economic status influences learning outcomes suggests that it is possible to combine high performance with high levels of equity in education,” the report states.
Making causal inferences
In a webinar last month for the Washington-based Education Writers Association, Andreas Schleicher, the OECD’s deputy director for education and skills, said that the new PISA results contradicted the widely held belief, based in part on a 2010 McKinsey & Co. report on teacher recruitment, that high-performing countries draw their teachers from the top third of the nation’s academic pool.
“Actually, I should tell you that’s not something we can see through PISA,” he said. “In fact, many of the countries doing really well on PISA get pretty well the average graduate. But they are very good in developing that talent, retaining that talent.”
Mr. Schleicher explained that the highest-performing countries also place a high value on education, have “universal education standards,” and use “personalization in addressing diversity” rather than “tracking and streaming students early on.” Above all, though, he argued, the best systems are “very good in getting the most-talented teachers to the most-challenging classrooms. ... They prioritize the quality of teachers over the size of classes.”
The OECD on Tuesday also released a nearly 550-page addendum report called “What Makes Schools Successful?” that aims to “share evidence of the best policies and practices and to offer our timely and targeted support.”
Many education experts warn, however, against making broad policy prescriptions based on PISA scores.
“These kinds of studies are really good at describing where we stand and maybe looking at trends,” said Mr. Buckley. “They’re not good at all at telling us why. The study design is not one that supports causal inference.”
Mark Schneider, a vice president of the Washington-based American Institutes for Research and a former NCES commissioner, said that, too often, stakeholders use PISA and other such tests to “to confirm existing policy preferences,” which he calls a “serious problem.”
“People have their favorite policy prescriptions and plug PISA data into it,” he said. “It’s not clear to me what the logical foundation is for observing a sample of 15-year-olds and talking about preschool.”
Diving into the 2012 PISA scores
Use the interactive table below to sort through the 2012 PISA scores. Click the icon to the right of the subjects to order the countries by those test scores. The scores have also been color-coded to indicate scores that are statistically higher, lower, and the same as the U.S. average score.