top of page

When to Use and When Not to Use Rigorous Impact Evaluations

To drive the uptake of rigorous impact evaluations (RIE) in international development, it is important to provide guidance on when to conduct a Rigorous Impact Evaluation. In short, if an evaluation seeks an answer to questions of cause and effect, an RIE is likely the most valid and reliable evaluation design (Krämer et al, 2021). For an explanation of what an RIE is, please see here. Yet, it is equally important to recognize situations where an RIE is not the right evaluation tool.


Below, we summarise some of the main factors influencing this decision.


Suitable Counterfactual


A RIE relies on a counterfactual approach, using a comparison/control group, to estimate what would have happened in the absence of the development intervention. When no suitable counterfactual can be identified, an RIE is not the right tool.

  • Randomised Control Trial (RCT): In an RCT, subjects are randomly assigned to either the treatment (intervention) group or the control group. Randomization helps ensure that both groups are similar on average, which makes it easier to attribute any differences in outcomes to the intervention.

  • Regression Discontinuity Design: This method is applicable when eligibility for the intervention is determined by a cutoff score or a specific point on a continuous variable (e.g. income). Individuals just above or below the cutoff score serve as the comparison group, assuming that they are otherwise similar.

  • Geographic or Temporal Comparisons: In some cases, it is possible to use geographic or temporal differences to create a comparison group. For instance, a nationwide policy can have a counterfactual, if the Government opts for a phased rollout.

  • Difference-in-Difference/ Propensity Score Analysis: In this method, the comparison group is not (entirely) randomly assigned. As a rule of thumb, the common assumption that the comparison group would respond to the treatment in the same way as the actual treatment group must be met (Gertler et al., 2016).

  • Pre-Post: Usually, it is not sufficient to measure the results before and after the intervention due to confounding variables influencing the outcomes. When no other factor could plausibly have caused any observed change in outcomes (which is rare), e.g. measuring time spent collecting water after installing water pumps, then the “before-state” is sufficient as a comparison group.

  • Synthetic control group: In some instances, it is possible to construct a synthetic control group by combining data from multiple control units to create a weighted average that closely matches the characteristics of the intervention group (see for instance, Abadie, Diamond, and Hainmueller, 2010).


Sufficiently Large Sample Size (Evaluation Power)


A RIE requires a sufficiently large sample to detect an impact. An RIE is not the right design if the number of project participants, thus the sample population, is too small to statistically detect an impact, even if one may exist. We generally require a minimum of 80% statistical testing power, which is dependent among others on the sample size and sample assumptions. A rule of thumb is a minimum number of participants of 250. As another rule of thumb stipulates the smaller the effect size, the bigger the required sample.


Methodological Risks


For an RIE to detect impact, there must be ideally no direct or indirect spillover between the treatment and the control/comparison groups. In addition, ideally, all units should comply with their initial assignment, i.e., all those assigned to the treatment group should actually be treated, while none of the comparison units should receive the treatment. However, some form of non-compliance is a frequent occurrence in international development projects. Rigorous impact evaluations can still lead to accurate detection of impact in the face of non-compliance by measuring the Intention to Treat (ITT) rather than the Average Treatment Effect (ATE) (for more information, click here). Yet, in cases where risks such as spillover and non-compliance are too difficult to mitigate, an RIE may not be the best evaluation tool.


Ethical Concerns


In most RIEs, human beings are the subject of analysis. This can generate ethical concerns. For instance, it may be unethical to provide an intervention to a treatment group or to withhold it from the control group for the sake of conducting an RIE. However most development projects do not have the budget to serve everyone, therefore treatment is naturally restricted to a certain geographical area or group of population anyways.


Timing


RIEs are typically focused on short- to medium-term outcomes. If the intended impacts of an intervention are expected to manifest only in the long term, an RIE may not be cost-effective. At the same time, there must be a sufficient amount of time elapsed between the development intervention and the collection of the data to detect an observable impact (Gertler et al., 2016). In such cases, it may be better to wait for a more appropriate time to conduct the evaluation.


Other Constraints


There are situations in which RIEs are not the right evaluation tool, i.e. when they are not cost-effective, too resource-intensive, or politically infeasible. See an expansion on this argument here. Furthermore, an appropriate RIE design should test not just the impact, but also the “why” behind the impact; i.e. test how the intervention (input) is expected to impact final outcomes and test each impact pathway (assumption) from inputs to outcomes. This is referred to as the theory of change (or program theory) and requires a mapping and understanding of the theory of change, as well as a good grasp of the local context (White, 2009).


 

Bibliography


Abadie, A., Diamond, A. and Hainmueller, J., 2010. Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American statistical Association, 105(490), pp.493-505.


Gertler, P. J. et al (2016). Impact Evaluation in Practice. World Bank, Washington, D.C.

Krämer, M., Jechel, L., Kretschmer, T. and Schneider, E., (2021) WHEN TO DO A RIGOROUS IMPACT EVALUATION?. German Institute for Development Evaluation (DEval). https://rie.deval.org/fileadmin/Redaktion/PDF/03_Methoden/RIE/DEval_2021_Guidance_Tool_When_to_do_an_RIE_final_.pdf


CAF. Development Bank of Latin American and the Carribean. (2019). 3 key aspects for a rigorous impact evaluation. https://www.caf.com/en/currently/news/2019/01/3-key-aspects-for-a-rigorous-impact-evaluation/


Gugerty and Karlan. (2018). Ten Reasons Not to Measure Impact—and What to Do Instead


Suite 5, 5th Floor, City Reach, 5 Greenwich View Place, London, E14 9NN
Registered in England & Wales
Company No: 14432481

Tax Number (UTR): 2324400598

  • LinkedIn
bottom of page