Original language | English |
---|---|
Pages (from-to) | A6-A9 |
Number of pages | 4 |
Journal | Journal of Clinical Epidemiology |
Volume | 117 |
DOIs | |
Publication status | Published - 1 Jan 2020 |
Access to Document
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
In: Journal of Clinical Epidemiology, Vol. 117, 01.01.2020, p. A6-A9.
Research output: Contribution to journal › Editorial › Academic › peer-review
TY - JOUR
T1 - Can the Schwarz and Lalouch concept of differentiating between explanatory and pragmatic studies be useful for diagnostic test studies?
AU - Tugwell, P.
AU - Knottnerus, J.A.
N1 - Funding Information: Peter Tugwell Department of Medicine, Department of Epidemiology and Community Medicine, Canada Research Chair, University of Ottawa, Institute of Population Health, Ottawa, ON, Canada Department of Medicine Department of Epidemiology and Community Medicine Canada Research Chair University of Ottawa Institute of Population Health Ottawa ON Canada Department of Medicine, Department of Epidemiology and Community Medicine, Canada Research Chair, University of Ottawa, Institute of Population Health, Ottawa, ON, Canada J. André Knottnerus [email protected] Department of General Practice, Netherlands School of Primary Care Research, Maastricht University, P.O. Box 616, 6200 MD Maastricht, the Netherlands Department of General Practice Netherlands School of Primary Care Research Maastricht University P.O. Box 616 Maastricht 6200 MD the Netherlands Department of General Practice, Netherlands School of Primary Care Research, Maastricht University, P.O. Box 616, 6200 MD Maastricht, the Netherlands Bossuyt et al report on an intriguing extension of the definitions and characteristics differentiating pragmatic from explanatory studies described by Schwartz and Lellouch [ 1 ] for randomized trials of interventions to define a similar continuum for diagnostic accuracy studies. In this extension, explanatory studies aim to better understand the behavior of a test; in contrast, pragmatic ones aim to support recommendations or decisions about using the test in clinical practice. The key differentiating characteristics for diagnostic studies include study eligibility criteria, the recruitment of patients, the reference standard, and the choice of the statistical analysis. Distinguishing between an explanatory and a pragmatic approach can be helpful in discussions about test accuracy studies, with consequences for the design, analysis, and interpretation of studies, These authors demonstrate this with a case study of comparing different designs in evaluating two accuracy studies of fecal calprotectin to distinguish inflammatory bowel disease from irritable bowel syndrome. As the authors point out, the differences are on a continuum rather than dichotomous and this concept would benefit by a group developing an assessment method such as the PRECIS tool that helps differentiate explanatory from pragmatic clinical trials. Another article on diagnostic tests by Hulcrantz et al is a concept paper from the GRADE group on how to define ranges for certainty ratings of diagnostic accuracy. Once again the concept of certainty is taken from previous work on certainty in assessing the effects of treatment interventions, and then extending this here to diagnostic tests. Indeed when diagnostic intervention studies comparing alternative diagnostic test strategies with direct assessment of patient-important outcomes are available (such as RCTs addressing the impact on survival after a screening strategy), the approaches for setting thresholds or ranges previously presented for interventions apply. In this paper, the authors explore the extension of these concepts when there are no studies that have directly compared the effects of alternative test strategies on downstream health outcomes. They outline how modeling the impact of diagnostic accuracy on the health outcomes can inform management decisions, about the course of the condition, treatment effectiveness, and the link between the test results and clinical management; ways are described of setting thresholds or ranges for rating certainty in diagnostic test accuracy, and what this would mean in the context of systematic reviews, HTA, and health care recommendations. These authors illustrate the different approaches with the example of comparing the direct comparison of accuracy between two tests for cervical cancer screening, the human papillomavirus (HPV) test (HPV DNA-PCR testing) and unaided visual inspection of the cervix with acetic acid [VIA]. That ‘journalology’ [i.e., the science and study of publication practices] is perhaps ‘coming-of-age’ is supported by the popularity of training courses such as that described by Butcher et al . Journalology extends the study of bibliometrics (which these authors point out, was not designed to measure the research's quality and value to colleagues) to develop strategies to improve the accessibility, accuracy, completeness, and transparency of publications; to include addressing the challenges of ‘research waste’ such as the trade-off between demonstrating reproducibility versus unnecessary duplication; to how to engage the different stakeholders in commissioning, doing research and those who disseminate and use the results of the research including using social media; to address research integrity; to implement reporting guidelines; and to conduct metaresearch, Useful resources are available on the Equator website ( https://www.equator-network.org/about-us/ ) which clinical epidemiologists will know for its impressive curating of reporting guidelines. The Equator courses now also offer in-person training. We would hope to see these topics becoming required competences included in formal clinical epidemiology graduate programs and being appropriately evaluated. Five other articles in this issue are relevant to journalology. A commentary by Puljak et al. specifically focusses on the importance of identifying and resolving discrepancies within individual study reports when authors abstract data for systematic reviews. It is well known that many manuscripts in leading medical journals and systematic reviews have errors. These errors pose challenges to readers, reviewers, guideline developers, and can propagate into SRs. Indeed it has been shown that such extraction errors may influence the effect size [ 2 ]. Puljak et al. categorise by the different parts of a manuscript with examples of differences between (a) abstract and text; (b) within-the-full-text; (c) text and figure (d) text and table. They provide advice on how to deal with different types of discordances and how to report such discordances when conducting systematic reviews. One essential hallmark of a high-quality systematic review is a protocol published prior to conducting the systematic review; this has always been required by Cochrane, the Campbell Collaboration and the Joanna Briggs Institute. The PROSPERO database was set up in 2011 as a database for anyone to register systematic reviews free of charge. Rombey et al sampled 500 non-Cochrane systematic reviews and found that the number registered in PROSPERO has increased each year and in 2018 over 30% were registered. This is encouraging but needs reinforcement by funders and journal editors. Attention is also needed to ensure that the records are updated once the systematic review has been published. Lai et al. report an update and extension of previous studies [ 3 ] to look at the rationale of secondary publications from controlled trials. ‘Salami slicing’ in journalology is the term used somewhat prejoratively to describe the excessive splitting of the results of studies and publishing each part as separate papers when there is no good reason not publish all the findings in one paper or report. This ‘salami slicing’ is considered unethical in the publishing world because it wastes resources (reviewers' time, journal space, etc.), creates a disjointed literature (forcing readers to go on scavenger hunts to find essential information), and dilutes an important currency of academia (inflating the curriculum vitaes of some and risking the possibility that rewards will flow to those who ‘play the game’ rather than to those who contribute most effectively) [ 4 , 5 ]. Lai et al. followed the trial registration numbers of all the 154 trials published in BMJ, JAMA, Lancet and NEJM in 2014 and found that two-thirds had at least one secondary publication, and 15 trials had more than 5 secondary papers. Although they identified some good reasons for secondary publications, they did identify substantive problems such as the fact that 20% reported only results from the primary publication so they do not provide new information compared with primary publication, and 35% reported results not pre-specified in the protocol. The authors suggest that if this practice cannot be stopped, then the words ‘secondary publication’ should be included in the title to alert the reader to this danger. The report by Saric et al suggests that published abstracts of systematic reviews even from a prestigious meeting of leaders in the field should not be relied on when making practice or policy decisions. They identified 193 abstracts describing systematic reviews from five World Congresses of Pain held from 2008 to 2016, and searched for corresponding full publications using PubMed and Google Scholar in April 2018. Forty percent of SR conference abstracts were not subsequently published in peer-reviewed journal. Especially concerning was their finding of some form of discordance in the main outcome results in 40% of the SR abstract-publication pairs with even a different direction of effect in 13 pairs. The adherence to the PRISMA–Abstracts criteria was only 33%. The authors call for conference organisers to insist on submitted abstracts of systematic reviews meeting these requirements. Searching for clinical practice guidelines is one of the commonest searches of MEDLINE but is highly inefficient without validated search filters. Lunny et al report that even with five of the best most frequently used published filters, although these had high sensitivity, their specificity is so poor that 1,000 references needed to be searched to find one relevant clinical practice guideline. They recommend that, for those with limited time and resources, the search be targeted using large repositories of guidelines such as the CMA Infobase, TRIP, and Epistemonikos. Six articles address statistical and related methods issues. Li et al provide an excellent tutorial on Forest Plots. First named as such in the 1970s, and popularised by their adoption by Cochrane, forest plots are an important advance for showing results to make comparisons between items easier. Sometime known as ‘blobbograms’, they usually consist of a point estimate and horizontal confidence limits centered on a vertical invalid line providing measures of effect for individual studies and overall pooled analysis. A forest plot generally shows information on individual study identity, numbers or rates of comparative groups, weighting of the individual studies, point estimates with confidence intervals, as well as details regarding pooled analysis including overall effect estimates and heterogeneity assessment. The construction of a forest plot can be performed in many software sources including SPSS, SAS, R, STATA, RevMan. Forest plots have many uses and 17 examples are nicely presented as a tutorial forma by Li et al .The examples include meta-analyses, clinical trials, and observational studies and are classified by the PICO (Population or Subgroup, Intervention or Exposure, Control, and Outcome) framework. It is now accepted by most in clinical epidemiology that systematic reviewers, guideline authors, and health care professionals at the point of care should never present relative effect estimates without absolute effect estimates, whether it be disease burden, diagnosis, treatment or prognosis. Foroutan et al tackle the problem of implementing this for dichotomous prognostic factors for example, the risk of death after a stroke in patients with different comorbidities. They provide an easily applied method to calculate absolute risk of future events for prognostic groups (e.g., old and young), including worked examples for relative risk, odds ratio, and hazard ratios. Their approach allows flexibility in choice of effect measures, prevalence of prognostic factors,and outcome frequency in the population of interest and thus supports effective incorporation of evidence addressing prognostic factors. Avery and Rotondi report on the uptake of the STROBE–Respondent Driven Sampling [STROBE-RDS] guidelines originally proposed in this journal in 2015 [ 6 ]. Pubmed lists over 150 papers per year that use this method to sample from well-connected but difficult to reach populations. Respondent Driven Sampling has traditionally been used to study the HIV epidemic but is making inroads into other historically stigmatized outcomes such as violence and mental health. The use of seeds (purposively selected members of the target population are known as ‘‘seeds.’), dual-incentive recruitment techniques, collection of personal network information, and tracing of recruitment chains are defining characteristics of respondent-driven sampling. Samples generated with Respondent Driven Sampling are not equivalent to simple random samples, largely due to non-random clustering among participants. But this can be addressed by including 4-5 waves of recruitment so that they indeed provide a more robust basis for making population inferences than convenience samples. These authors studied a random sample of 25 papers published in 2017 using Respondent Driven Sampling to assess if they satisfy the criteria listed in the STROBE–RDS reporting guideline. Deficiencies in reporting of the methods and statistics were found that could be avoided if editors would insist on the use of the STROBE-RDS criteria. Mediation Analysis is a popular statistical approach for investigating the pathways and mechanisms by which an intervention results on a demonstrated outcome in an RCT. Vo et al reviewed 98 RCTs published in 2017 and 2018 that reported the results of Mediation Analysis. They concluded that the large variation in and quality of the methods used, let alone the lack of adequate reporting of details, make them difficult for readers to assess the credibility of the results. These authors call for the establishment of practical consensus guidelines for their conduct and reporting. Thresholds for clinical importance and the related minimal clinically important differences are the focus on many papers in this journal. Consensus on the appropriate methods for these is now needed for the new class of patient reported questionnaires based on the computer adaptive testing methodology based on Item Response Theory. The appeal of these measures is that they allow one to administer different questions tailored to the (severity of) problems of individual patients on a given quality of life domain while still obtaining comparable scores across patients on the same metric. However it is not yet agreed on how to define the ‘thresholds for clinical importance’ of these scores. Geisinger et al. determined the threshold for clinical importance of the new adaptive version of their EORTC QLQ-C30 questionnaire by comparing in 98 patients, the responses of their new questionnaire with three anchor dichotomous questions (“Has your symptom/problem limited your daily life?”; “Has your symptom/problem caused you or your family/partner to worry?”; “Have you needed any help or care because of your symptom/problem?’’). This is in contrast with the approach used by the widely used PROMIS QOL Computer Adaptive Questionnaires. The latter have used case vignettes describing a range of possible severity levels for each domain, which were then used to establish the threshold for each domain. More work is needed to establish consensus on the advantages and disadvantages of the different approaches as well as ensuring that the results are interpretable by the reader. Factorial randomised trials continue to be popular, probably because they allow evaluation of the different components of complex interventions and because of the savings in sample size requirements. However their interpretation is challenging whenever there is a concern over the interactions between the different components. Kahan et al reviewed the quality of the analyses of 100 randomly selected factorial randomised trials published between January 2015 and March 2018. Interactions were common but analyses of these interactions was suboptimal. If this design continues to be used, improvement in the analysis and reporting of factorial trials is required to allow valid conclusions. Two articles address risk of bias issues. Non-randomised studies are slowly establishing their place beyond that of traditional epidemiology into the evidence base of interventions in organisations such as Cochrane (formerly known as the Cochrane Collaboration) that now have an endorsed risk of bias scale for non-randomised studies that complements their risk of bias scale for randomised trials. Dhiman et al. applied this to 52 published non-randomised studies of interventions funded by the UK National Institute for Health Research between 2012 and 2017. Most were cohort studies and 29% before-and-after studies. Most studies had serious or critical risk of bias mainly due to confounding that was not addressed. The authors conclude that such papers are adding to the problem of research waste in the literature. Stone et al report on the use of their bias adjustment approach to quality effects modeling where the risk of bias assessment is quantified and incorporated into the pooled results of meta-analysis. The application is illustrated by the example of a meta-analysis of risk of sensitization of rhesus-negative women when pregnant with a rhesus positive fetus from eight studies comparing women receiving routine antenatal anti-D immunoglobulin with controls. They suggest this approach be considered as an option for bias adjustment in systematic reviews. Finally, Speich et al. cast a spotlight on the importance of budget planning for RCTs t as one key contributing cause of 30% of RCTs having to be prematurely discontinued due to failure to recruit enough patients. They point out that in many cases clinical studies are underfunded, that the compensations paid to study centers participating in multicenter trials often do not cover the actual costs, and that this is the main reason why RCTs were discontinued. In their scoping review they identified 25 available budget-plannning tools downloadable on websites but only two were validated and user-tested. They call for wider sharing and more user testing of such tools.
PY - 2020/1/1
Y1 - 2020/1/1
U2 - 10.1016/j.jclinepi.2019.12.004
DO - 10.1016/j.jclinepi.2019.12.004
M3 - Editorial
C2 - 31924314
SN - 0895-4356
VL - 117
SP - A6-A9
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
ER -