about program effectiveness, while identifying in detail the data deficiencies that prevent
a comprehensive evaluation (Solomon, 1998). These deficiencies include the quality of
environmental data, the adequacy of techniques used to gather data, and the integrity of
the analytical techniques used to evaluate that data. GAO recommendations,
consequently, primarily focus on ways to improve the databases needed for future,
methodologically rigorous, evaluations. These recommendations tend to have little
practical use in short-term decision-making.
The availability of information is a major consideration in designing a program
evaluation. For example, the availability of data often shapes whether the approach
selected is quantitative or qualitative. Quantitative approaches often rely on a quasi-
experimental or “scientific” approach, and utilize statistical methods to analyze data and
support conclusions. Such approaches can be expensive, time-consuming, and
impractical where data is not readily available, and may encourage researchers to focus
only on those dimensions of a problem that are operational and measurable. Often, more
practical evaluations are those that use qualitative methods, such as interviews,
observations, and the review of written documents, to support conclusions (Rich, 1998).
The status of qualitative approaches has risen in recent decades. As King et al. (1987:15)
explain: “The development of evaluation thinking over the past twenty years has led
away from the notion that the quantitative research study is the only or even the ideal
form for an evaluation.”
The efforts of the U.S. Environmental Protection Agency (EPA) highlight several
difficulties of program evaluation. Historically, the agency based its evaluations on
activity (or output) measures, such as number of permits issued, and was roundly
criticized for not measuring what was really important: environmental outcomes (Mintz,
1995; NAPA, 1997). In this respect, the agency was primarily engaged in so-called
formative evaluations, which are most typical of young programs, are usually reliant on
output measures, and are most often conducted internally with the goal of improving
program performance. Critics, in contrast, were calling for so-called summative
evaluations, which are often more appropriate for mature efforts, use outcome data, and
are often conducted by (and/or for) outside entities responsible for making decisions
about continuing or discontinuing programs.
38
The agency responded in recent years by
focusing more on environmental outcomes such as water quality data, thereby balancing
process analyses with outcome assessments.
39
This, however, has led to a new
methodological problem: trying to link outcomes to EPA activities.
This experience highlights a great paradox in evaluation: focusing on means (i.e.,
processes and outputs) tells you little about ends (i.e., impacts and outcomes); focusing
38
The distinction is credited to Scriven (1967).
39
One example of this commitment to better integrate outcome measures into program evaluations is the
National Performance Measures Strategy (NPMS). According to EPA, the strategy not only “includes
traditional measures, such as the number of inspections and enforcement actions conducted each year, it
also establishes new outcome measures for evaluating the behavioral and environmental results of our
activities. These measures include compliance rates for selected regulated populations, pollutant reductions
resulting from enforcement actions, behavioral changes stemming from compliance assistance, and average
time for significant violators to return to compliance” (EPA, 2000:3).
116