back to list of Papers
Systemic Evaluation : What we are learning and what we need to know
Norman Webb
National Institute for Science Education
February 25, 1997
Systemic evaluation is an evolving concept. Over the past six years gathering information for judging the merit and worth of systemic initiatives has matured as a field. We have a greater understanding about what is the nature of systemic change and what roles evaluation can serve. For the past two years the Strategies for Evaluating Systemic Reform Team (SESR) of the National Institute for Science Education has endeavored to learn about evaluation of large-scale reform from evaluations done of the state systemic initiatives.
In thinking about evaluation we have depended heavily on the work of those who are doing evaluations. We reviewed the plans for the evaluation of state systemic initiatives of 15 states and Puerto Rico. We consulted the reports from SRI International, Policy Studies Associates, Consortium for Policy Research in Education, and ABT Associates Inc. Other sources of information include organizations engaged in doing, studying, or providing technical assistance on evaluation of systemic initiatives including Horizon Research, Inc., Inverness Research Associates, Westat, Inc., and the McKenzie Group.
What we are learning?
Critical functions of systemic initiatives help define roles for evaluation that are somewhat different from the more traditional roles of purely summative and formative evaluations.
Design. The design process of a systemic initiative is fluid and continuous. For many of the initiatives, what was set out at the beginning of the initiative had to be changed, modified, and thought about again. Not only once, but several times. An important role served by many of the SSI evaluators was as a critical friend to judge and help shape the design of the initiative's strategy. This molding of the initiative over time and the on- going strategic thinking elevated the design to more than a planning document, but an outcome of a learning process. The designs have become as much of a product of the initiatives as they have served to guide the work of the initiative.
Management. Closely related to developing a design for the initiative is managing the initiative and making decisions for the initiative. Evaluators of the state systemic initiatives recognized the importance for the initiatives' staff and others to have access to information about context, implementation, and outcomes for decision making. The number of those making decisions and the range of decisions are numerous in systemic initiatives. Management teams faced issues of congregating a critical mass of stakeholders, developing effective networks of teachers, and planning for long-term sustainability while under pressure to demonstrate improved student learning. With limited resources, evaluators continuously faced the practical and conceptual issues of how much attention to give toward producing information on process and capacity building or toward documenting student and other outcomes. How some evaluators coped with this issue was to exert efforts to identifying interim impacts.
Leverage. Reform on the scale of the systemic initiatives engages many people at levels of detail that can obscure the larger picture of reform. Mounting an effort to saturate large, multifaceted education systems requires leveraging resources, aligning components, and strategically thinking about "going to scale." Evaluation of systemic initiatives requires some attention to where the initiative is heading along with what the initiative is doing. One role evaluators assumed was to study continuous improvement and how this improvement would project to the full system. State systemic initiatives' evaluators assumed a proactive role to leverage the systems in concrete ways. Some highlighted best practices that were used to promote early successes statewide. Others guided initiatives to position themselves to take advantage of unanticipated opportunities.
Verification. Systemic reform is based on a theory that assumes the highest level of education will be achieved if all of the pieces and components within a system are aligned and working in cooperation toward important common goals. One role of systemic evaluation is to verify this theory as applied to the system. This formidable task requires understanding the logic of the system, what are the components, how are they linked, and what is their collective force. Evaluators of state systemic initiatives attended to one piece of this task by developing and seeking answers for "linking" questions. They sought to establish paths from the intervention to local changes in practices and policy.
Student Learning. State systemic initiative evaluations varied by the attention they devoted to measuring student outcomes. This is due, in part, to the lack of instrumentation and availability of the information; the initiative's focus; and the evaluator's perspective. Some evaluators believed that developing a system's capacity for sustained improvement is critical to launching significant change. This requires time. Any attempt to measure student outcomes and attribute them to the initiative in the first stages of its development was felt to be conceptually wrong and detrimental to the initiative's longevity. If positive outcomes were not produced, the initiative may be too fragile to with stand strong criticism even though significant ground work had been set for important future changes in student outcomes. Not all evaluators held this view and incorporated some assessment information in their evaluation. Some compared student scores of teachers who had participated in the initiative's training with a control group. Some used trend data from state assessments over a number of years. One initiative used student assessment as one of its major parts. A three-pronged approach was designed--measuring long-term gains in student outcomes, developing and using pre- and post-tests in classrooms, and training teachers to use alternative forms of assessment. The system-wide assessments provided incentives for teachers to learn more about assessment and encouraged them to attend to student outcomes using the developed classroom assessment instruments.
What do we need to learn?
Much is still to be learned about systemic evaluation and measuring change in large education systems. A few areas are listed here.
Equity. Assessing equity in student learning throughout the system is a critical concern and raises important questions for systemic evaluations. More attention needs to be given toward developing ways for measuring a system's progress in achieving equity in student learning. Related questions include:
How can progress toward equity in a system be measured in concert to validating a high quality of mathematics and science learning throughout the system?Achievement Measures. Assessment technology is insufficient to measure all important knowledge of science and mathematics. Valid techniques to apply on a large scale still need to be developed to measure how students are able to reason, to solve complex problems, to build arguments, and do scientific inquiry. "Habits of mind," meta-cognition, and dispositions are important qualities for pursuing science and mathematics, but are very difficult to measure. Some questions are:How can an evaluation effectively identify how well efforts of a systemic initiative supports helping every student reach his or her full potential?
What levels of disaggregation of data are necessary to study equity within a system?
What are methods that can be used to judge that a system is educating students to do challenging mathematics and science?System Saturation. Systemic evaluation, like systemic reform, must keep the entire education system within its view at all times. The most common approach to evaluating systemic initiatives was to study the components and parts of the system. Less is known about how to consider the interaction among components, their linkages, and what are magnifying effects. More is needed to be learned about tracking through a system change attributable to the systemic initiative. Some related questions are:How can evidence be produced of how students are developing habits of mind and their competence to think deeply about mathematics and science?
How can an evaluation strike a balance between gathering the immense amount of data that would be useful to know and the amount of information that can be effectively collected and analyzed at a reasonable cost?Time Frame. It is unclear how much time must be allowed before various changes in an education system should become observable and sustainable. In judging the value of systemic reform, systemic evaluation must attend to the institutionalization of structures and functions that will sustain movement toward positive outcomes. Evaluation is frequently called upon to produce information and judgements before sufficient time has transpired and enough effort has been expended to fully reach goals. More needs to be understood about how to identify and measure interim attainments and progress. We also need to understand more about what is a reasonable amount of time for a system to make significant changes. This, of course, will depend on a number of factors. Some related questions are:How can an evaluation grasp the full extent of a system to make judgements about the effectiveness of a systemic initiative to institute change?
How can evaluation produce information on the amount of resources, concentration of efforts, and strategy for change necessary to reach systemic reform for any given system?
How can the quality of classroom experiences and teachers' interactions with students be incorporated into a systemic evaluation effectively and without great expense?
What is the appropriate time frame for detecting change in student learning across the system that can be attributed to a systemic initiative and can be judged as sustainable?What interim outcomes serve as strong indicators of progress toward systemic change?