back to list of Papers
Iris R. Weiss
February 4, 1997
In 1990, the National Science Foundation requested proposals from states to reform their science and mathematics education systems. While the solicitations defined the overall goals of systemic reform, the choice of reform strategies was left to the proposers. The openness of the solicitation was partly in recognition of the fact that states differ considerably in their needs and resources, and therefore, would require different approaches to reform. In large part, however, the lack of prescription reflected the reality that the "theory" underlying systemic reform was exceedingly thin, indicating for example, the importance of shared goals and policy alignment, but providing little guidance on how to achieve either.
If the theory of systemic reform and the NSF solicitation for translating the theory into practice were non-prescriptive about implementation, they were nearly silent on how one might evaluate those efforts. Proposers were told it was important to have a plan for formative and summative evaluation and to have that plan carried out by qualified personnel, but there was little guidance about the kinds of evidence to collect.
And so we embarked together, the states and the evaluators, they on figuring out what those exceedingly broad goals like policy alignment meant, and we on how to tell if they were accomplishing them; they on how to use the modest monies provided by NSF to impact the system both broadly and deeply, and we on how to use the small amount of money earmarked for evaluation to help document and improve their efforts.
The statewide systemic initiatives were followed in fairly rapid succession by the urban systemic initiatives, the rural systemic initiatives, and the local systemic change initiatives. My company and I have been involved in one way or another with the evaluation of each of them. Most of what I have learned about the evaluation of systemic reform in the last five or six years seems painfully obvious now, although it certainly wasn't at the time. It is clear that evaluation can be helpful throughout the systemic reform process in the initial planning, helping to fine-tune the initiative as it unfolds, and documenting evidence of impact. But there are potential problems and numerous trade-offs throughout.
1. Assessing needs and documenting progress
Most systemic reform plans start with a description of the current status of the system. Whether called a needs assessment or collection of baseline data, evaluation early on can help document the status of the system prior to the reform initiative. The challenge in systemic reform is that you cannot stop with documenting teacher knowledge or the alignment of the curriculum with national standards or classroom practice or any other component of the system. Rather, you need to understand and describe both the various parts of the system and their interrelationships. Unfortunately, this task is not nearly as straightforward as it sounds, since we aren t always sure what aspects of the system "matter" nor how to measure many of them. As a result, evaluators are essentially making it up as we go along, both in assessing initial status and in documenting progress. We know we need to look at multiple components of the system, but which ones, at what times, and in what depth is not at all clear.
2. Critiquing the program design
Perhaps because the goals of systemic reform are so broad, there are many more possible pathways for reform than are typically available to program developers who have more limited objectives. As a result, evaluators of systemic reform efforts are often asked to use evaluation methodologies to help critique and refine the strategy of the initiative, looking at the probable advantages and disadvantages of alternative courses of action. This process can be very helpful to an initiative in discovering mismatches between the goals and the proposed activities while there is still time to correct them. However, while the role of evaluator as critical friend seems appropriate enough, it is sometimes easy for the evaluator to slip into design consultation and technical assistance, which may or may not be appropriate. In the extreme, this process may lead to the evaluators evaluating the quality and impact of their own ideas, which is clearly not what the funders had in mind.
3. Challenge of meeting diverse information needs
The most difficult part of the evaluation of systemic reform is that different stakeholders have different information needs and expectations for the evaluation. The PIs may be primarily interested in assessing the quality of their pilot programs so they can improve them before scaling up. Others may want to go very quickly to looking for evidence of impact, which some may define as changes in classroom practice and others as changes in student performance.
Some stakeholders want the numbers surveys provide; others insist on going beyond self- report data even if it means observing only a small number of teachers. Most want both, and lots of them. Similarly, those who want to go directly to student measures differ markedly in what counts, with some looking for changes in whatever standardized tests or proficiency measures the state or district uses to assess students and others insisting on "authentic" measures.
Whatever the measures you choose, do you assess progress for all teachers/students in the system or only those who have been reached directly by the initiative? The former is more in keeping with the goals of systemic reform, but it seems like a waste of resources to look for impacts that are unlikely to have occurred. It is easy to get stuck jumping back and forth between the various options, and while getting prior agreement on an evaluation plan will in theory resolve these problems, in practice the pressure to collect more and more data to address emerging information needs can be intense, even irresistible.
4. The fallacy of "it's out there"
As PIs modify their initiatives based on experience and in response to emerging opportunities, evaluators need to keep pace with new evaluation strategies and instruments. Unfortunately, there is a lack of "tools" for the evaluation of systemic reform. While you rarely have the budget to do extensive instrument development, the evaluation instruments everyone says are "out there" are awfully hard to find, whether you are looking for classroom observation instruments or performance tasks that are aligned with reform goals, not to mention ways to assess the extent of policy alignment.
5. Deciding who gets what information when
It doesn't take long before you are awash in data. Finding the time to analyze and report it is difficult, especially because the initiative is on-going and you and the PIs are reluctant to stop collecting data. Tensions abound. PIs want and need honest feedback in order to improve their programs, but at the same time want you to put the best spin possible on anything that appears in writing. We typically give the PI a draft report to check for inaccuracies and misinterpretations, but sometimes their feedback seems aimed less at verification and more at turning your analytic work into a public relations document. On more than one occasion I've had to remind a project that we were preparing a report, not negotiating a treaty. And on more than one occasion a PI has reminded me that they are engaged in an enterprise that is as much political as it is technical, and that critics can use even the smallest negative finding to undermine a promising initiative.
I can end with platitudes. Less is more. Don't collect more data than you can afford to analyze. In deciding on spin, remember you have to look at yourself in the mirror in the morning. Or I can end with the greater truth that the challenges and complexities involved in evaluating systemic reform have helped me and others deepen our understanding of how to bring about improved science and mathematics education on a broad scale. No one said it would be easy.