šŸ¤” Stuck in the middle? šŸƒ

ECHILD: Record linkage evaluation

Tony Stone

6 Sep 2023

The ECHILD research database

  • Department for Education (DfE) [identity assigned by DfE]:
    • National Pupil Database (NPD): Education and social care records (collated from schools, awarding bodies, local authorities, other social care providers)
  • NHS England [identity assigned by NHS E]: Healthcare provided by
    • Hospitals
    • Community services, inc. maternity services (not Primary Care)
    • Mental health services
  • Office for National Statistics [identity assigned by NHS E]: Life-events data
    • Birth registrations
    • Death registrations
  • UCL [identities assigned by NHS E]:
    • Mother - child links (derived from hospital delivery records and birth notifications)

Schematic diagram of data flows

Dataflow cluster_nhse NHS England cluster_npd DfE cluster_ons ONS cluster_ons_ucl UCL area cluster_ons_others Other users nd_nhse_demographics Demographics nd_nhse_link Linkage process nd_nhse_demographics->nd_nhse_link nd_nhse_data Health datasets nd_ons_other_holdings ECHILD nd_nhse_data->nd_ons_other_holdings nd_ons_ucl_holdings ECHILD nd_nhse_data->nd_ons_ucl_holdings nd_nhse_links Bridge file nd_nhse_link->nd_nhse_links nd_nhse_link_meta Linkage metadata ā“ nd_nhse_link->nd_nhse_link_meta nd_nhse_links->nd_ons_other_holdings nd_nhse_links->nd_ons_ucl_holdings nd_nhse_link_meta->nd_ons_ucl_holdings nd_npd_db NPD nd_npd_data data nd_npd_db->nd_npd_data nd_npd_id Demographics nd_npd_db->nd_npd_id nd_npd_data->nd_ons_other_holdings nd_npd_data->nd_ons_ucl_holdings nd_npd_id->nd_nhse_link

Where is UCL in all this?

Stuck in the middle (wrt evaluation)

On one side

  • Historically NHS England supplied very little linkage (meta)data
  • NHS England has limited capacity
  • UCL Agreement included ā€œevaluate linkage quality and biasā€ but how not specified

On the other:

  • Existing (internal) projects wish to understand changes in linkage quality and bias
  • Unspecified, future projects will wish to understand how their analyses are impacted by the linkage

What ECHILD / HOPE studies wish to evaluate

  • Impact of changes in linkage methods between ECHILD v2 -vs- v1:
    • NHS Englandā€™s implementation of its Master Patient Service (MPS) - changed how the same patient is identified across NHS E datasets (esp. in the absence of a NHS Number)
    • Demographics from across all NPD tables vs just 1 [spring census] table
    • Increased data coverage (period) and expanded cohort:
      • Period: 1984-09-01 to 2023-03-31 vs 1995-09-01 to 2020-03-31
      • Cohort: Born 1984-09-01 to 2023-03-31 vs 1995-09-01 to 2020-03-31
  • Case ascertainment and bias within groups (e.g. more recent birth cohorts, children with Downā€™s syndrome)

How we wish to do this?

Building on GUILD:

  • using a gold standard dataset to quantify false matches and missed matches [unavailable]
  • comparing characteristics of linked and unlinked data to identify potential sources of bias
  • using sensitivity analyses to evaluate how sensitive results are to changes in linkage procedure.

Also:

  • Ecological level comparisons: comparison with small area census counts by demographic characteristics.

What we require

Linkage sensitivity analyses:

  • Information on the strength of a match (deterministic only) āœ”ļø

NHS Englandā€™s change to internal linkage method (MPS)

  • Representative dataset with both old and new (MPS) linkage method applied āœ”ļø

For eval of impact of using identifiers from all NPD tables & impact of increased data coverage and expanded cohort:

  • Information on the distribution of identifiers across NPD tables and their individual strength of a match / non-match ā“ [TBC]

Wider Questions

  • What linkage (meta)data is currently made available by different state-backed linkage providers?

  • Beyond GUILD, what linkage (meta)data should routinely be made available by state-backed linkage providers?

  • Merge-split issues are likely to impact many routine data analyses (whether linkage is implicit or explicit). What metadata should we request from all data providers to evaluate this?