What works?

Short

Comment: Using evidence to inform policy is obviously correct. But it may have been oversold in terms of what it can realistically deliver, says Dan Davies

22nd March 2016

Evidence-based policy is very much in fashion at the moment in all departments of government. Of course it's a good idea; the main argument for it is summarised admirably by the name. But people who expect big things from evidence-based approaches ought to be really quite worried right now.

Because the methodology used in a lot of evidence-based policy analysis is very similar to that used in experimental psychology. And at the moment, psychology is a subject with some very serious methodological problems.

It's being called the 'reproducibility crisis' and in summary, the problem is that large-scale and careful attempts to replicate some of the best-established and most important results of the last few decades are not finding the effects they were meant to find. This is even happening for effects like 'ego depletion' (the idea that resisting temptation requires effort and makes it harder to exercise willpower), which are the subject of dozens or even hundreds of research papers.

There appear to be two related problems. First, there is a knot of issues relating to methodology and the interpretation of statistical tests, which means that there is a systematic tendency to find too many statistically significant results. And second, it turns out that a lot of psychology results are just 'fragile' – they describe much smaller sets of individuals than hoped, and are very dependent on particular situations, rather than reflecting broad truths about humanity.

Both of these problems are likely to be shared by a lot of other areas. For example, the methodology of behavioural economics has a very big overlap with experimental psychology, and is likely to have many of the same reproducibility issues. So lots of 'nudge' schemes related to savings and pensions could be based on fragile results.

People who expect big things from evidence-based approaches ought to be really quite worried right now

There's also a lot of methodological overlap with education research and even development economics. Based on my reading of key papers like Andrew Gelman's Garden of Forking Paths (PDF), any area of research which uses the treatment/response model of controlled experimentation from medical science but which doesn't require its methods for data selection and analysis to be registered ahead of time, is likely to be producing fragile results. Moreover, in many areas the academic literature is so compromised by unreproducible results and publication bias that even large metastudies are not going to solve the problem.

To the extent that policy analysts have different institutional incentives for publishing academics, they are likely to be worse. And to the extent that 'natural experiments' and real-world analysis differ from laboratory science, it is also likely to be in the direction of generating more fragile results. This is a problem which ought to have the evidence-based policy community very worried.

So what's the solution? The answer can't be to pretend that evidence-based policy never existed, or that anecdote and political prejudice are a better way to make decisions. If we take a look at the way in which scientists are tackling the problem of published research, though, we can see what steps might be taken to ensure that evidence-based policy develops in the right direction.

Though there are two facets to the reproducibility problem, there is only one worth solving. The fact that people and communities are different and that this makes experimental results fragile in the human sciences is inconvenient, but it's a fact about the world, and dealing with it is the essence of policymaking itself. The problem of misleading statistical significance does have a solution. But you might not like it.

If you're serious about basing policy on evidence, how much are you prepared to spend on research, and how long are you prepared to wait for the answers?

As I've said, the problem seems to be caused by the use of the treatment-response model, in a context where the choice of how to analyse the data is made after collecting it. As Gelman details, if this is what you do, then it is very easy to put your finger on the scales – even without any bad faith and ignoring institutional pressure to produce statistically significant results, it is very hard for a researcher to avoid concluding that the 'best' way to analyse a collection of facts is the way which seems to give them a logical structure.

Explore

Hooked on labs Are we on the cusp of a golden age of experimentalism or have we reached Peak Lab? The experimental city Gabriella Gómez-Mont tackles the problems of Mexico City

This is not what happens in pharmaceutical research. In drug tests, all details of methodology have to be filed and registered before the trial begins. It is a convention which has been adopted over the years precisely to avoid this kind of bias. Now here's the bit that you're not going to like. How expensive and time-consuming is it to get a new drug to market? How many initial ideas have to be generated in order to get a single robust result that can be confidently expected to perform better than a placebo without unacceptable side effects? This analogy is taking us in a pretty scary direction for a philosophy of policy-making that was meant to provide a quick and easy way to find out what works. So the real 'reproducibility crisis' for evidence-based policy making would be: if you're serious about basing policy on evidence, how much are you prepared to spend on research, and how long are you prepared to wait for the answers?

Using evidence to inform policy is obviously correct. But it's not a silver bullet and it may have been heavily oversold in terms of the amount of policy it can realistically deliver. At the very least, we need to always remember that when an advisor says:

"Whichever way you look at the numbers," they mean: "Whichever way *I* look at *these* numbers... "

Homepage image by Slava via Creative Commons 2.0

Our weekly newsletter features updates on all the latest articles from The Long + Short, and a roundup of the best stories of innovation from around the web, too.

See our archive of previous newsletters. View our privacy policy.

We want our stories to go far and wide; to be seen be as many people as possible, in as many outlets as possible.

Therefore, unless it says otherwise, copyright in the stories on The Long + Short belongs to Nesta and they are published under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

This allows you to copy and redistribute the material in any medium or format. This can be done for any purpose, including commercial use. You must, however, attribute the work to the original author and to The Long + Short, and include a link. You can also remix, transform and build upon the material as long as you indicate where changes have been made.

See more about the Creative Commons licence.

Images

Most of the images used on The Long + Short are copyright of the photographer or illustrator who made them so they are not available under Creative Commons, unless it says otherwise. You cannot use these images without the permission of the creator.

Contact

For more information about using our content, email us: [email protected]

HTML

HTML for the full article is below.

Evidence-based policy is very much in fashion at the moment in all departments of government. Of course it's a good idea; the main argument for it is summarised admirably by the name. But people who expect big things from evidence-based approaches ought to be really quite worried right now.

Because the methodology used in a lot of evidence-based policy analysis is very similar to that used in experimental psychology. And at the moment, psychology is a subject with some very serious methodological problems.

It's being called the <a href="http://www.wired.com/2016/03/psychology-crisis-whether-crisis/">'reproducibility crisis'</a> and in summary, the problem is that large-scale and careful attempts to replicate some of the best-established and most important results of the last few decades are not finding the effects they were meant to find. This is even happening for effects like <a href="http://www.vox.com/2016/3/14/11219446/psychology-replication-crisis">'ego depletion'</a> (the idea that resisting temptation requires effort and makes it harder to exercise willpower), which are the subject of dozens or even hundreds of research papers.

There appear to be two related problems. First, there is a knot of issues relating to methodology and the interpretation of statistical tests, which means that there is a systematic tendency to find too many statistically significant results. And second, it turns out that a lot of psychology results are just 'fragile' – they describe much smaller sets of individuals than hoped, and are very dependent on particular situations, rather than reflecting broad truths about humanity.

Both of these problems are likely to be shared by a lot of other areas. For example, the methodology of behavioural economics has a very big overlap with experimental psychology, and is likely to have many of the same reproducibility issues. So lots of 'nudge' schemes related to savings and pensions could be based on fragile results.

There's also a lot of methodological overlap with education research and even development economics. Based on my reading of key papers like Andrew Gelman's <a href="http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf">Garden of Forking Paths</a> (PDF), any area of research which uses the treatment/response model of controlled experimentation from medical science but which doesn't require its methods for data selection and analysis to be registered ahead of time, is likely to be producing fragile results. Moreover, in many areas the academic literature is so compromised by unreproducible results and publication bias that even large metastudies are not going to solve the problem. To the extent that policy analysts have different institutional incentives for publishing academics, they are likely to be worse. And to the extent that 'natural experiments' and real-world analysis differ from laboratory science, it is also likely to be in the direction of generating more fragile results. This is a problem which ought to have the evidence-based policy community very worried.

So what's the solution? The answer can't be to pretend that evidence-based policy never existed, or that anecdote and political prejudice are a better way to make decisions. If we take a look at the way in which scientists are tackling the problem of published research, though, we can see what steps might be taken to ensure that evidence-based policy develops in the right direction.

Though there are two facets to the reproducibility problem, there is only one worth solving. The fact that people and communities are different and that this makes experimental results fragile in the human sciences is inconvenient, but it's a fact about the world, and dealing with it is the essence of policymaking itself. The problem of misleading statistical significance does have a solution. But you might not like it.

As I've said, the problem seems to be caused by the use of the treatment-response model, in a context where the choice of how to analyse the data is made after collecting it. As Gelman details, if this is what you do, then it is very easy to put your finger on the scales – even without any bad faith and ignoring institutional pressure to produce statistically significant results, it is very hard for a researcher to avoid concluding that the 'best' way to analyse a collection of facts is the way which seems to give them a logical structure. This is not what happens in pharmaceutical research. In drug tests, all details of methodology have to be filed and registered before the trial begins. It is a convention which has been adopted over the years precisely to avoid this kind of bias. Now here's the bit that you're not going to like. How expensive and time-consuming is it to get a new drug to market? How many initial ideas have to be generated in order to get a single robust result that can be confidently expected to perform better than a placebo without unacceptable side effects? This analogy is taking us in a pretty scary direction for a philosophy of policy-making that was meant to provide a quick and easy way to find out what works. So the real 'reproducibility crisis' for evidence-based policy making would be: if you're serious about basing policy on evidence, how much are you prepared to spend on research, and how long are you prepared to wait for the answers?

Using evidence to inform policy is obviously correct. But it's not a silver bullet and it may have been heavily oversold in terms of the amount of policy it can realistically deliver. At the very least, we need to always remember that when an advisor says:

"Whichever way you look at the numbers," they mean: "Whichever way *I* look at *these* numbers... "

<h6>Homepage image by <a href="https://www.flickr.com/photos/slava/4548877789/"></a><a href="https://www.flickr.com/photos/slava/4548877789/">Slava</a> via Creative Commons 2.0</h6>

This article was originally published on <a href="../index.html">TheLong+Short</a>. Read the <a href="the-problem-with-evidence-based-policy.html">original article</a>.

Explore

Homepage image by Slava via Creative Commons 2.0

Sign Up

Republish

Images

Contact

HTML