I have been working on a post for some time now, in which I was planning to use web-scraping in R to gather sports-related data from webpages and then run some fancy analysis on it. But when I say ‘working on’, I mean that I’ve playing around with the data, staring at a whole bunch of exploratory plots, and trying to come up with an angle for the analysis.
And, so far, I’ve come up with: nothing.
But perhaps some good can come out of this. The process of trying to come up with an idea to fit the data reminded me of this quote by Sir Ronald Fisher (I should admit that I know of the quote because a guy on the R mixed models mailing list has it in his signature):
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
In the past couple of years, I’ve started teaching week-long statistics workshops (*cough*), which are incredibly satisfying (if also incredibly draining!). I have also gone through that enlightening period where learning a little more makes you realise how much you don’t know, which opened up a whole slew of new things to learn (also, I have started listening to statistics-related podcasts, which possibly means that I need professional help). I’m lucky in that I’ve managed to figure out some of this stuff, which leads to other people now coming to me for help with their analysis. Questions range from ‘does my model look ok?’ all the way to ‘I have this data, how should I analyse it?’. The latter always brings that quote into sharper focus.
It’s for this reason, the idea that statistical analysis should be an intrinsic part of the planning of any project, that I was interested to read articles recently on the idea of registering studies with journals prior to gathering the data. This idea stems mainly from the fact that too much of whether something gets published depends on it being a positive result, and the ‘spin’ of the results – with hypotheses often dreamed up post-hoc – affects which journal the study gets published in (obviously there’s more nuance than that, but maybe not that much more). By registering the design beforehand, you can go to a journal and say: this is the question, here are our hypotheses, here’s how we’re going to tackle it with an experiment, and here’s how we will analyse the data. The journal would then decide beforehand if they think that’s worth publishing – whatever the result.
This is a little simplistic, of course – there would have to be the usual review process, and there would obviously be leeway for further analysis of interesting trends on a post-hoc basis – but it would enforce greater thinking about an analysis strategy prior to embarking on a study. Even the simple task of drawing out the potential figures that would come out of the data collection is crucial to the process, as they help to clarify what is actually being tested.
So – that post I was originally setting out to write? I have the data, but I still haven’t had any good ideas for how to use it. And maybe it’s that kind of backwards approach that we all need to stay away from.
Kaplan & Irvin (2015) Plos One: ‘Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time‘
The Guardian: ‘Trust in science would be improved by study pre-registration‘
The Washington Post: ‘How to make scientific research more trustworthy‘
New York Times: ‘To get more out of science, show the rejected research‘
FiveThirtyEight: ‘Science isn’t broken‘ (This is a must-read)
My favourite statistics podcasts!