Which people are most affected by changes to data linkage methodology?

An exploration of patient, organisational and spatiotemporal characteristics in administrative hospital data in England

Tony Stone, UCL / University of Sheffield

16 Nov 2023

Context

  • NHS England (née Digital) operate various data services/collections for/from organisations receiving funds from the NHS in England
  • In 2021 NHS Digital changed how they assign records to an individual within their hospital episode statistics (HES) datasets, the first change in 12 years
  • HES is a very widely used administrative health data collection in England, by
    • Providers
    • Commissioners
    • Researchers (inc. within the UK’s national statistics institute)

What are the HES datasets

  • event based records at the patient-activity level
  • for specific types of care provision
  • funded by the English NHS -OR- taking place in a NHS hospital in England.

Admitted patient care (APC)

  • Admitted to hospital (inc. day case admissions)
  • Since 1989 but NHS Number only mandated since 1997
  • Additional HES Critical Care (CC) dataset available

Outpatient (OP) appointments

  • Attended and unattended
  • Since 2003

Accident and Emergency (A&E)

  • Non-admitted unscheduled(-ish) care at
    • Emergency Departments
    • Urgent Care Centres
    • Minor Injury Units
    • Walk-in Centres
  • From 2007 to 2020

Data submitted used to identify patients within HES

  • NHS Number
  • Date of birth
  • Stated gender (though named sex and likely to have changed over time)
  • Postcode of patient place of residence
  • Local patient identifier
  • Local patient identifier provider (NHS Trust)

What’s not available

  • Names
  • Address precision better than postcode
    • 15 properties, but up to 100, per postcode.

HES ID algorithm

  • deterministic
  • stepwise
  • clusters HES records into disjoint sets

cluster_all HES records cluster_pid1 HES ID 1 cluster_pid2 HES ID 2 cluster_pid3 HES ID 3 a1 a2 a3 a4 a5

Person ID assignment algorithm (MPS)

MPS: Master Person Service

Used across NHS Digital/England’s “directed” data collections.

  • deterministic
  • stepwise
  • based on data available at the point of algorithm execution
  • Links each HES record to - at most - one identity

cluster_records HES records cluster_persons Person IDs a1 b1 A a1->b1 a2 a2->b1 a3 b2 B a3->b2 a4 b3 C a4->b3 a5 a5->b3

Data source

  • HES datasets
  • April 2007 - March 2020 (13 years)
  • Patients aged 55 or less at date of activity (i.e. must have date of birth)

Amounts to:

  • 979 million records
  • 59 million distinct HES IDs
  • 55 million distinct Person IDs
  • 60 million distinct (HES ID, Person ID)-pairs

Uninformative, Simple, Merges, Splits and the Complex

categorisation records distinct pairs HES IDs Person IDs
“Uninformative” 9.3M - - -
Simple 881.3M 42.1M - -
Merge (HES ID) 69.0M 6.5M - 2.5M
Split (HES ID) 13.2M 1.1M 0.4M -
Complex 6.6M 0.6M 0.4M 0.4M

cluster_uninformative "Uninformative" cluster_simple Simple cluster_complex Complex cluster_split Split cluster_merge Merge h6 HES ID: 1 p6 Person ID: A h6--p6 p7 Person ID: B h6--p7 h7 HES ID: 2 h7--p7 h5 HES ID: 1 p4 Person ID: B h5--p4 p5 Person ID: A h5--p5 h3 HES ID: 1 p3 Person ID: A h3--p3 h4 HES ID: 2 h4--p3 h2 HES ID: 1 p2 Person ID: A h2--p2 h2--p2 h1 HES ID: 1 p1 Person ID: A h1--p1

Features investgated

First reported:

  • Age (grouped: Infants, Children, Young Adults, Adults)
  • Gender
  • Ethnicity
  • Index of deprivation quintile
  • Year of activity

And:

  • Total activity records (grouped into quintiles)

Results

Disscussion

  • Changes act at the fringes (91% records unchanged)
  • Most group wise differences are small
  • Greatest differences likely due to:
    • location information (as key component in linkage)
    • improved data recording / verification by Trusts
  • Deprivation still a factor
  • Reason for activity not investigated in this work

Acknowledgements

We gratefully acknowledge all the patients whose de-identified data are used in this research.

This work uses data provided by patients and collected by the National Health Service as part of their care and support. Source data can also be accessed by researchers by applying to NHS England.

This work is/was supported by ADR UK (Administrative Data Research UK), an Economic and Social Research Council (part of UK Research and Innovation) programme.

This research benefits from and contributes to the NIHR Children and Families Policy Research Unit, but was not commissioned by the National Institute for Health Research (NIHR) Policy Research Programme. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

References

1.
Herbert A, Wijlaars L, Zylbersztejn A, Cromwell D, Hardelid P. Data resource profile: Hospital episode statistics admitted patient care (HES APC). International Journal of Epidemiology. 2017 Aug;46:1093–1093i.
2.
Boyd A, Cornish R, Johnson L, Simmonds S, Syddall H, Westbury L, et al. Understanding hospital episode statistics (HES) resource report [Internet]. 2018. Available from: https://closer.ac.uk/research-fund-2/data-linkage/linkage-health-data-hospital-episode-statistics/
3.
NHS Digital. The person_ID handbook for HES users - version 1.0.4 [Internet]. 2022. Available from: https://digital.nhs.uk/services/personal-demographics-service/master-person-service
4.
NHS Digital. Announcement of methodological change: Impact of changes to hospital episode statistics (HES) processing from april 2021 - version 3.0 [Internet]. 2022. Available from: https://digital.nhs.uk/data-and-information/find-data-and-publications/statement-of-administrative-sources/methodological-changes