Saturday 22 August 2015

A 'trial' that doesnt quite 'stack-up'

Proponents of colour to treat visual stress in poor readers frequently refer to this study. To me, it looks very much like a post-hoc data trawl, or exploratory analysis, which is at best hypothesis generating.
The study is poorly written and it is hard and sometimes impossible to extract the raw data.

Tyrrell R, Holland K, Dennis D, Wilkins A.Coloured overlays, visual discomfort, visual search and classroom reading. JRes Read. 1995 Feb;18(1):10–23.

A total of 60 children were included (see below) however the main experimental group consisted of 46 children who were categorised into, above average readers (10), average readers (18) and below average readers (12). An additional group of older children age age 14-16 who were well below average readers were also identified (6)


 Two control groups were also identified. They were matched with the 6 well below average readers in terms of reading age (RA control group) and Chronological age (CA control group). Unfortunately when it comes to the results, the CA and RA control groups are not compared with the well below readers but with the whole group rendering their RA and CA control status invalid.

 Procedure
  Subjects were tested on three occasions and that testing must have been pretty long and tiring.
It involved testing from the scotopic sensitivity screening manual, choosing overlays, reading for 15 minutes with or without an overlay and a visual search task. When you consider that the key outcome was slowing of reading in the final five minutes of a fifteen minute reading task, it would be useful to know how long this whole process took.



Results.
So how about the results.  Looking at table two below it can be seen using the criterion of immediate benefit from an overlay, 100% of well
below average readers 75% of below average readers, 56% of average readers and 40% of above average readers had visual stress. That looks rather high to me. It is important to remember that screeners were not blinded to the status of the readers. So room for bias here.
There is also some surprising data resulting from Irlen's 1983 tests of perceptual difficulty. 75% of above average readers had moderate perceptual difficulty! And almost nobody had low perceptual difficulty.

The key finding of this study and one that does not really stand up to scrutiny is shown in the table below. Looking at the column on the left that is really a crossover study comparing performance with chosen overlay and no chosen overlay. It is at high risk of bias because
there is no placebo used and external validity is low because of the circumstances of testing; a fatiguing session at which a number of different variables were tested. The results show no overall improvement with overlays, but without overlays there was some slowing in the final five minutes. Please note however that this is expressed in syllables per minute not words. Even if you believe that this study is methodologically sound (which I do not) the results in unlikely to be educationally significant. The difference in words was very small indeed. Also note that the reading age controls are clearly not comparable with the group who chose overlays.

Summary.

This study is at best an exploratory study and because of what appears to be a flexible post hoc data analysis this study is at most hypothesis generating.
There are serious problems with 'internal validity'.
1) No placebo control group  -compares coloured overlay with no overlays
2) Post hoc data analysis - no pre-trial protocol available
3) Non standardised reading test expressed in syllables per minute which exaggerates any effect
4) No overall improvement with overlays. Possibly got less worse in the final 5 minutes.
5) No proper comparisons made with the CA and RA control groups
6) Small groups and low statistical power

Problems with external validity
Although subjects were recruited from a school setting which is good, the main problem was that reading was assessed in the same session as the Irlen perceptual tests, the overlay selection procedure and visual search task. It is perhaps not surprising that some subjects fatigued during the final five of 15 minutes reading aloud.

Overall another epic failure of peer reviewing!






 

What is medicine's 5 sigma?

Another piece exploring the same theme as the previous post - that is the poor quality and high risk of of false positive results in much of the scientific literature. This time my source is an editorial from the Lancet a leading general medical journal. You can read it all here

Richard Horton the editor of the Lancet attended a meeting at the Welcome Trust in London. The theme of the meeting was the current state of the biomedical research literature and the endemicity of bad research. As one speaker put it - poor methods get results. Another stated that too any scientists sculpt their data to fit their preferred theory of the world or retrofit their hypotheses to fit the data (see the previous post about arrows and targets)
The current estimate is that about half of studies contain false positive results. So when proponents of the use of colour to treat visual stress ( and any number of other conditions) say that research in the peer reviewed literature has shown that ............ ( you can fill in the blank) that is not enough.These studies have to be looked at in detail.

A number of factors were identified that put studies at high risk of producing false positive associations. These were

1) Small sample sizes
2) Tiny effects
3) Invalid exploratory analysis
4) Flagrant conflicts of interest
5) Pursuing fashionable trends of dubious importance

Much of the research on visual stress in dyslexia ticks all of these boxes. While this does not remove the need to read each study in detail and with  crtitical eye the statement that research in the peer reviewed literature has shown that........... is not nearly as impressive as it sounds.
Onwards with more reviews

Sunday 16 August 2015

More negative studies and why it is a good thing

An important study has appeared in the journal PLoS One  that shows that fewer trials in the field of cardiovascular medicine are reporting positive outcomes. Why is this good new? And what does it have to do with the treatment of visual stress?

First; some background.  As many as 50% of published studies contain false positive results and the situation is particularly bad in the field of neuroscience and psychology. While some false positive results are inevitable, the numbers at present suggest some systematic biases in the literature.
For example, in the field of fMRI studies more positive findings are reported that the study designs can support. Trials have certain power or ability to detect  significant differences which depends on the sample size and variability of the population being studied. John Ioannidis has shown evidence of  bias (in the statistical sense of the word) operating. Small studies with a lower power to detect signal changes are finding as many positive findings as larger more powerful studies. Naturally, you would expect smaller studies, with less statistical power, to find fewer positive effects.You can download the study here.
So how has this state of affairs come about? Not through fraud as you might think -although that can happen.  Human nature and the subjective biases that affect us all are probably the culprit.
The first problem is that studies reporting positive outcomes are more likely to be published; so called publication bias. One of the things you have to do when reviewing the literature in a systematic fashion is to look out for unpublished material which may not have found a home because negative findings were being reported.
Another important factor is a flexible approach to data analysis. If you do an exploratory study in which you measure multiple variables, and analyse your data in multiple different ways you are quite likely to find some positive results which pass the arbitrary criterion for statistical significance of 0.05. For example you could stratify your groups multiple different ways or you could have multiple endpoints and not declare which was the primary endpoint.
This has been compared to the man driving past a barn who sees  lots of targets with arrows in the centre and assumes the farmer must be a pretty good shot. Then, as he drives a little further he sees the another wall of the barn  where the farmer is painting targets around arrows he has already fired into the wall! It's a bit like that with trials if you allow a flexible post-hoc approach to data analysis.
This may be acceptable for early exploratory studies within a field but such studies are at best hypothesis generating and the results need to be confirmed by properly conducted RCTs.
To get round this problem many funding bodies  insist that researchers pre-register the trial, stating what  what the outcome measures will be,  what subjects are going to be studied and what statistical tests will be used. This, in crude terms, is equivalent to painting the target on the wall before the arrow is fired. With this has come a reduction in positive findings and that is a good thing. False positive studies waste resources and can endanger human life.
So what has this got to do with the treatment of visual stress? Well, many of the studies of treating 'visual stress' in dyslexia show the hallmarks of a flexible approach to data interpretation, Dividing the subjects into multiple small groups and studying lots of different outcome measures. Then pouncing on those that appear positive an ignoring the rest.
Before anyone starts to take these ideas seriously we need a randomised controlled trial with pre-registered protocols and outcome measures.