Monday 29 January 2018

Post publication peer review of visual stress papers.

In the early days of science, results would be presented 'live' at meetings and the author could receive immediate and sometimes pretty lively feedback on their findings and any shortcomings of their study.
With the growth of science, the number of scientists became too great to fit in any room and findings were published in print journals. Those papers are usually only accepted after they have been peer-reviewed by one or usually two experts in the field rather than a room full of scientists. After that, there might be some correspondence relating to the paper that appeared in subsequent issues of the scientific journal. Then, papers remained fixed and immutable in the literature free to radiate information or misinformation. That said, the majority of scientific papers are soon forgotten or are superseded by subsequent research. The problem was that even if a subsequent debate about the findings had occurred it might not be accessible to a researcher who found and accessed the paper years down the line and researchers were free to misrepresent the findings of the study through citation distortion. That is, funnelling readers away from critical reviews and commentaries.

Traditional peer reviewing happens before a paper is published and positive reviews are a necessary condition for publication in most scientific journals. There are a number of well-documented weaknesses of this process. First, it is not transparent so readers may not have sight of the reviewer's concerns. Second, reviewers are human and often have to review complex studies, unpaid, in their own time and with myriad other pressures and deadlines. It not surprising that serious problems can be overlooked. Finally, sometimes peer reviewers are reluctant to challenge authors who are in a position of power. Although the process is supposed to be anonymous much scientific research occurs in a 'small pond' and the authors may be able to guess the identity of the peer reviewers.

All this means that publication in the peer-reviewed journals is not a guarantee of quality or reliability and the growth of journals in the open-access, author-pays model has meant that pretty much anything can get published. Nonetheless, even in 'respectable journals' there can be a problem with papers well past their 'sell by date' continuing to radiate misinformation and being cited for marketing purposes.

To put it another way what if the two peer reviewers find no problems but the 'roomful of scientists' reading the paper after publication uncover serious shortcomings? This is undoubtedly a problem in the visual stress literature.

Potential solutions
In the web-based publishing age, attempts have been made to rectify this problem so that journals or at least the scientific literature can, to a greater extent, be self-correcting. The challenge is to allow this while keeping the 'nutters' out. For example, the anti-vaccine movement could easily disrupt reputable publishing on the efficacy or safety of vaccines. The other problem is discouraging a 'gotcha' mentality. It isn't a crime, or evidence of wrongdoing, to publish findings that subsequently do not stand up to close scrutiny indeed it is part of the scientific process.
A number of attempts have been to restore some kind of balance which include,
Blogging. There are numerous scientific blogs out there that provide much-needed post-publication commentary on papers. Indeed, that is what this blog tries to do in a modest sort of a way. For much better example see Richard Lehman's humane and well-written blog that appears in the British Medical Journal.  The problem is, however, that a reader discovering a questionable study might be unaware of online criticism in blogs. All the same, I like to think that in a small way they help to keep people honest.
PubPeer-the online journal club. This enables comment on any articles in the published literature and allows direct feedback to authors who can comment if they wish. Unfortunately, it is not easy to link these comments to the original paper. So for example, if you searched and found the original paper you might be unaware of an important dialogue on PubPeer.
Open Review is a tool on ResearchGate which allows authors to publish a more detailed review than is possible in the correspondence section of a journal. In practice, it doesn't really seem to have 'taken-off'.
Retraction watch. The most extreme form of post-publication peer review, results is a retraction. This is usually reserved for scientific fraud and I am certainly not alleging that with any of the visual stress papers. That said, papers can also be retracted when honest mistakes come to light and retraction can even be a sign of scientific integrity. Even retraction does not always solve the problem because papers sometimes continue to be cited long after they have been retracted.

A suggestion
There are papers that are not so egregiously bad that they should be retracted but where major shortcomings in the handling of data, discussion or conclusions have come to light. Where those papers are still being cited or are used for marketing purposes they should be should be modified after publication. In the days of paper journals, this was impossible. You can not call back journals from the shelf, insert new pages rebind the spine, and return them to the library shelf. All this, however, could change with online publishing which allows post publishing peer review in a way that is easily accessible to researchers and would indeed automatically come to light on accessing the paper. I am not calling for some form of post-publication censorship. The original version and the amended version should both be available. And if the authors refuse? In a word retraction.
In the next post some visual stress papers that require modification or retraction.


Friday 12 January 2018

A systematic review of controlled trials on visual stress using intuitive overlays or the intuitive colorimeter

Or: A systematic review of the placebo effect tested by means of coloured overlays?


This 'systematic' review appeared in the Journal of Optometry October 2016. You can download the article here from the Journal Optometry website.

The review does not conform to procedures for conducting systematic reviews and I think that the Journal of Optometry has done a disservice to the field by allowing it to be published in its current form.
Systematic reviews have been discussed in some detail in a previous post of June 2016.
A systematic review does not look at results and statistical tests in isolation instead it looks (in a systematic fashion) at the behaviours and practices that led to those results. This is because those
practices can easily bias the outcomes of a trial. This is not bias in the prejudice sense of the word but in the statistical sense.  I like to think of it in terms of setting up a car properly. If the steering is not adjusted just right, the tyre pressures are uneven or the brakes are not set up properly a car can easily veer off course without any intent to depart from a straight line by the driver. So it is with clinical trials. Biases can easily creep in and influence the results. This is usually in a way that produces false positive results. The general idea of a systematic review is to analyse the sources of bias and to exclude those studies at high risk of bias from the final analysis or at least prioritise those studies at low risk of bias.

Clinical trials, like cars, can easily veer off course if everything is not
set up just right. The result is a false-negative or more commonly
a false-positive outcome.
The sources of bias are usually analysed using a set of 'off the shelf'' tools that look at a range of features of the trial including randomisation, allocation concealment, blinding of participants and researchers, attrition bias and reporting bias. The most commonly used tools are those developed by the Cochrane Collaboration. The authors of this study used the Critical Skills Appraisal Program (CASP) criteria. I have never seen these used for a systematic review, rather than appraising individual papers. I have experimented with them they seem a little 'clunky' and difficult to tabulate. Nonetheless, they seem to contain the relevant domains of bias. At least they do if you choose the right set.

This brings us to the first problem. The domains of bias in tables 2 and 3 do not correspond to the domains of bias of the CASP criteria for randomised controlled trials and the authors appear to have developed their own hybrid rating scale that is of unknown statistical validity. In response to criticism, the authors argued in a letter to the Journal Optometry that there are eight CASP checklist tools and there was no one checklist that covered all the domains relevant to their papers. This is wrong, the p-values upon which Evans and Allen place so much importance almost exclusively arise from crossover studies and for that reason the CASP tools for randomised controlled trial should have been used. Furthermore, notwithstanding their belief, the hybrid rating scale of Evans and Allen is untested in any type of study.

The next stage of a systematic review, having assessed and tabulated the risk of bias for the papers reviewed, is to select only those studies at the lowest risk of bias and base the analysis upon those studies. This is because studies at high risk of bias tend to overestimate treatment effects or even conclude there is a treatment effect when there is in fact none. This is irrespective of any p-value less than 0.05. Indeed, such values are pretty well meaningless if the study is at high risk of bias in one or more domains.

The authors adopt a different approach - that of 'vote-counting' - in which all studies were included irrespective of the risk of bias. Then, they counted up the studies that seemed to support their argument or proposed treatment. They concluded that because 8 out of 10 studies reported p values less than 0.05 - the balance of probabilities was of a treatment effect. This is wrong and not the approach advocated by the Cochrane Collaboration amongst others. This is because small-scale studies that are at high risk of bias are cheaper and easier to produce and as a result usually outnumber studies at low risk of bias. Consequently, the vote-counting approach overestimates treatment effects. It is fair to say that none of the studies included in the analysis of Evans and Allen would make the 'final cut' in a properly conducted systematic review.

A very simple counter-argument to this approach, which is in fact not at all far-fetched, is to consider a field in which there is one large study at low risk of bias with a sample size of 500 which found no effect and five small studies, at high risk of bias, with sample sizes of twenty patient each. Of those smaller studies, five reported a treatment effect. A vote-counting approach would conclude 5:1 that there was a treatment effect even though the larger and better study with more patients than all the other studies put together reported no effect.

An alternative (more plausible) explanation for the results
There is another way of looking at their data. All the studies that showed a 'positive result' were at risk of bias due to lack of masking of both participants and researchers. Their review could just as easily be 're-badged' as a systematic review of the placebo effect tested by means of coloured overlays. The study might then be criticised for failing to consider the putative disorder visual stress. The review provides equally compelling evidence for the power of the placebo effect mediated by colour. Given the strong scientific foundation and evidence base for the placebo effect compared to the foundations that underpin visual stress hypothesis  I know which is the more likely explanation for their results.

Conflicts of interest 
One final but important concern is the incomplete declaration of financial interest at the foot of the paper that states-

Professor Evans has received honoraria for lectures and has acted as an expert witness on this topic. He is an unpaid committee member and secretary of the not-for-profit Society for Coloured Lens Prescribers (www.s4clp.org).

Subsequently, when pressed, the authors gave a more complete declaration-

The authors have received honoraria for lectures on this topic. Bruce Evans acted (some years ago) as an expert witness on this subject. He is an unpaid committee member and secretary of the not-for-profit Society for Coloured Lens Prescribers (www.s4clp.org). Bruce Evans is Director of Research at the Institute of Optometry which is an independent charity that receives donations from i.O.O. Sales Ltd. which sells, amongst other products, Intuitive Overlays, the Pattern Glare Test, and the Wilkins Rate of Reading Test. He has an optometric practice in Essex in which he uses these items and the Intuitive Colorimeter and Precision Tinted lenses. The Institute of Optometry also uses these items in some of its clinics.

There is nothing wrong with any of this but a full and frank declaration at the outset would have been better. Also, a matter of concern is that very few of the papers reviewed  (Evans and Allen were usually among the authors of the papers that they are themselves reviewing)  contain a complete conflict of interest statement. This problem seems to be endemic within the visual stress literature.
In addition, it not clear how these studies were funded. For example, who paid for the overlays? How were the study workers paid? Did participants or participating schools receive honoraria?
This matters, industry-sponsored studies are more likely to report positive results.

Conclusion
This pseudo-systematic review does not provide compelling evidence for the use of intuitive overlays and precision tinted lenses. All of the studies are at high risk of bias in one or usually multiple domains. Critically, the positive studies are universally at risk of bias due to lack of masking. While it is acknowledged that masking of participants is difficult there is no reason not to mask researchers. Furthermore, a study comparing one coloured overlay with another would be less at risk of bias than one that compares a coloured overlay with no overlay or a clear-sheet. Some attempt to mask participants would be better than none.