A Hierarchy of Program Evaluation Metrics

We're diving into researching data-based decision making for a couple of projects, which includes metrics for program evaluation. Evaluation is a well trod field --a lot of people have said a lot of things about it... but I find a lot of it contradictory or not that helpful. I especially dislike the commonly used nomenclature of "outputs" and "outcomes." First, is it even possible to find two more confusingly similar words to represent two different concepts? Second, the terms aren't even used consistently and specifically-- there's a fair amount of overlap where one authority might call something (say, attendance at a workshop) a output when someone else would call that same thing an outcome.

The idea of outcomes has also been raised to such a degree (often by funders) that nonprofits often think that it's really important to get to outcomes as downstream in their work as humanly possible. If tracking the % who completed a course or who said it made a difference to them is good, then it's even better to gather and track the long term satisfaction. And if that's good, then we all should be shooting for the holy grail: measuring what impact our programs have on people's lives and community.

I disagree with this notion. It encourages nonprofits to try to fly before they can walk. They need to start by measuring the tactical things that are close at hand and get that down before moving to more complex measures. Even more, the idea that small nonprofits should be trying to measure impact writ large is insane to me -- trying to tease out the actual impact of one organization's actions, separating it from the impact of other actions and drivers, is work for professional researchers and, often, multi-million dollar studies. There's no way that any but a huge nonprofit is -- or should be-- staffed to conduct this kind of study. When we (or, ahem, funders) encourage small nonprofits to try to measure this stuff, most often it can only result in bad research.

Well, none of us can ignore what funders think is important, or the commonly used vocabulary for such an important concept... but I can, as I'm prone to do, try to create more order and structure around it. Here's my look at a hierarchy of nonprofit-focused program evaluation metrics, and how organizations can think about their priority.

What do you think? Would you agree? What have I missed?

Comments

Love this post

Thank you so much for bravely writing about the ineffectiveness of evaluation process and metrics. I have never seen any sort of logic model result in actual improvement of programs, exhibitions, etc. It consumes an enormous amount of team time to tease out the miniscule diffrences between "outcomes" and "output" for little end gain. Funders require it, so we all do it. But we all hate it (except those who make a living out of evaluation) and we all know that it has no real, direct, bottom-line effect on quality.

 

Logic Models - UWisc Extension

With a little thought and practice, I think you can distinguish between program and project outputs and outcomes so they don't overlap. The University of Wisconsin Extension has lots of free training materials and resources for developing logic models. They are easy to understand and helpful. Using an analogy from their materials, if your need is to get rid of a headache, taking an aspirin is an output/activity and the extent to which the headache goes away is the outcome/impact. I think even one-year projects can measure some short-term outcomes. Adapted from the above, those might include changes in knowledge, skills (pre- and post-testing), attitude, motivation, and awareness (pre- and post-survey). You list many across participation (pledge - a change in intent) and initial success (all). You might be able to measure change in medium-term outcomes, like behaviors, practices (surveyed, or measured by reduction or increase in service delivery/need), and policies and procedures, if they occur during the time-frame of your report (annual or end of a two to three year project). And I agree that even over three to five years you might not measure a significant change in long-term outcomes (conditions - like poverty, or global climate change) or be able to attribute it to your program without, as you indicated, actual research. 

Logic Models - UWisc Extension

 Thanks, Lisa!  I don't question that one *can* define outputs and outcomes so they don't overlap, but in practice, some sets of  definitions put things like participation data in the "output" category and some in the "outcome" category.  In fact, some would have it sometimes in one and sometimes in the other depending on it's use.  For instance, for your headache analogy, one could say that I advertise a program (the asprin) and people show up (the headache goes away).  Or I could say that a participant shows up (the asprin) and they then change their behaviour (the headache).  My point is just that these labels are not necessarily straightforward or easy to understand for an average nonprofit.

 



hierarchy of measures

Lisa wrote: "Using an analogy from their materials, if your need is to get rid of a headache, taking an aspirin is an output/activity and the extent to which the headache goes away is the outcome/impact." The person with the headache is "the client." Taking an aspirin is not an output for the client. Inducing a client to take an aspirin might be an output for the program. An outcome might be increased intestinal bleeding in the client (an undesired/unintended negative outcome) or cessation of the headache (or both). On the other hand, if the "program" is a drug company, then perhaps inducing a client to take an aspirin (implying the sale of an aspirin) might well be the desired outcome.
More generally, the fact that the "same thing" (in this case taking an aspirin) may be considered an output or an outcome can be vexing, unless viewed as a function of context.
If I am in charge of recruiting students for an afterschool program, then my outcome is the number of students who sign up for the program. From the perspective of the program as a whole the number of students participating could be an output. From a different perspective, the incoming students might be seen as inputs, raw material needing to be transformed into students with better study skills as the outputs. Of course the program may declare that improved study skills is the outcome. But perhaps from the POV of the school administration what matters is whether potential drop outs are turned into graduates, and that is the outcome they need to see to continue to support the afterschool program.
Laura is right in saying programs need to measure the basics-how many hours of service did they deliver, did staff X deliver it according to the organization's service model, did the clients "like it" (if not they won't continue to participate) etc. And they need to do this from the start. Before leaping into questions of whether something produced an outcome, you really need to know what that "something" is. Thanks for letting me ramble.