The Impact of the Choice of Data Source in Record Linkage Studies Estimating Mortality in Venous Thromboembolism

Arlene M. Gallagher*, Tim Williams, Hubert G. M. Leufkens, Frank de Vries

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


Linked electronic healthcare databases are increasingly being used in observational research. The objective of this study was to investigate the impact of the choice of data source in estimating mortality following VTE, with a secondary aim to investigate the influence of the denominator definition. We used the UK Clinical Practice Research Datalink (CPRD) to identify patients aged 18+ with venous thromboembolism (VTE). Multiple cohorts were identified in order to assess how mortality rates differed with a range of data sources. For each of the cohorts, incidence rates per 1,000 person years (/1000py) and relative rates (RRs) of all-cause mortality were calculated. The lowest mortality rate was found when only primary care data were used for both the exposure (VTE) and the outcome (death) (108.4/1000py). The highest mortality rate was found for patients diagnosed in secondary care (237.2/1000py). When linked primary and secondary care data were included for eligible patients and for the overlapping period of data collection, a mortality rate of 173.2/1000py was found. Sensitivity analyses varying the denominator definition provided a range of results (140.6-164.3/1000py). The relative rates of mortality by gender and age were comparable across all cohorts. Depending on the choice of data source, the population studied may be different. This may have substantial impact on the main findings, in particular on incidence rates of mortality following VTE.
Original languageEnglish
Article numbere0148349
Issue number2
Publication statusPublished - 10 Feb 2016

Cite this