Evidence-based policy is very much in fashion at the moment in all departments of government. Of course it's a good idea; the main argument for it is summarised admirably by the name. But people who expect big things from evidence-based approaches ought to be really quite worried right now.
Because the methodology used in a lot of evidence-based policy analysis is very similar to that used in experimental psychology. And at the moment, psychology is a subject with some very serious methodological problems.
It's being called the 'reproducibility crisis' and in summary, the problem is that large-scale and careful attempts to replicate some of the best-established and most important results of the last few decades are not finding the effects they were meant to find. This is even happening for effects like 'ego depletion' (the idea that resisting temptation requires effort and makes it harder to exercise willpower), which are the subject of dozens or even hundreds of research papers.
There appear to be two related problems. First, there is a knot of issues relating to methodology and the interpretation of statistical tests, which means that there is a systematic tendency to find too many statistically significant results. And second, it turns out that a lot of psychology results are just 'fragile' – they describe much smaller sets of individuals than hoped, and are very dependent on particular situations, rather than reflecting broad truths about humanity.
Both of these problems are likely to be shared by a lot of other areas. For example, the methodology of behavioural economics has a very big overlap with experimental psychology, and is likely to have many of the same reproducibility issues. So lots of 'nudge' schemes related to savings and pensions could be based on fragile results.
There's also a lot of methodological overlap with education research and even development economics. Based on my reading of key papers like Andrew Gelman's Garden of Forking Paths (PDF), any area of research which uses the treatment/response model of controlled experimentation from medical science but which doesn't require its methods for data selection and analysis to be registered ahead of time, is likely to be producing fragile results. Moreover, in many areas the academic literature is so compromised by unreproducible results and publication bias that even large metastudies are not going to solve the problem.
To the extent that policy analysts have different institutional incentives for publishing academics, they are likely to be worse. And to the extent that 'natural experiments' and real-world analysis differ from laboratory science, it is also likely to be in the direction of generating more fragile results. This is a problem which ought to have the evidence-based policy community very worried.
So what's the solution? The answer can't be to pretend that evidence-based policy never existed, or that anecdote and political prejudice are a better way to make decisions. If we take a look at the way in which scientists are tackling the problem of published research, though, we can see what steps might be taken to ensure that evidence-based policy develops in the right direction.
Though there are two facets to the reproducibility problem, there is only one worth solving. The fact that people and communities are different and that this makes experimental results fragile in the human sciences is inconvenient, but it's a fact about the world, and dealing with it is the essence of policymaking itself. The problem of misleading statistical significance does have a solution. But you might not like it.
As I've said, the problem seems to be caused by the use of the treatment-response model, in a context where the choice of how to analyse the data is made after collecting it. As Gelman details, if this is what you do, then it is very easy to put your finger on the scales – even without any bad faith and ignoring institutional pressure to produce statistically significant results, it is very hard for a researcher to avoid concluding that the 'best' way to analyse a collection of facts is the way which seems to give them a logical structure.
ExploreHooked on labs Are we on the cusp of a golden age of experimentalism or have we reached Peak Lab? The experimental city Gabriella Gómez-Mont tackles the problems of Mexico City
This is not what happens in pharmaceutical research. In drug tests, all details of methodology have to be filed and registered before the trial begins. It is a convention which has been adopted over the years precisely to avoid this kind of bias. Now here's the bit that you're not going to like. How expensive and time-consuming is it to get a new drug to market? How many initial ideas have to be generated in order to get a single robust result that can be confidently expected to perform better than a placebo without unacceptable side effects? This analogy is taking us in a pretty scary direction for a philosophy of policy-making that was meant to provide a quick and easy way to find out what works. So the real 'reproducibility crisis' for evidence-based policy making would be: if you're serious about basing policy on evidence, how much are you prepared to spend on research, and how long are you prepared to wait for the answers?
Using evidence to inform policy is obviously correct. But it's not a silver bullet and it may have been heavily oversold in terms of the amount of policy it can realistically deliver. At the very least, we need to always remember that when an advisor says:
"Whichever way you look at the numbers," they mean: "Whichever way *I* look at *these* numbers... "