Department for Education (DfE) [identity assigned by DfE]:
National Pupil Database (NPD): Education and social care records (collated from schools, awarding bodies, local authorities, other social care providers)
NHS England [identity assigned by NHS E]: Healthcare provided by
Hospitals
Community services, inc. maternity services (not Primary Care)
Mental health services
Office for National Statistics [identity assigned by NHS E]: Life-events data
Birth registrations
Death registrations
UCL [identities assigned by NHS E]:
Mother - child links (derived from hospital delivery records and birth notifications)
Schematic diagram of data flows
Where is UCL in all this?
Stuck in the middle (wrt evaluation)
On one side
Historically NHS England supplied very little linkage (meta)data
NHS England has limited capacity
UCL Agreement included āevaluate linkage quality and biasā but how not specified
On the other:
Existing (internal) projects wish to understand changes in linkage quality and bias
Unspecified, future projects will wish to understand how their analyses are impacted by the linkage
What ECHILD / HOPE studies wish to evaluate
Impact of changes in linkage methods between ECHILD v2 -vs- v1:
NHS Englandās implementation of its Master Patient Service (MPS) - changed how the same patient is identified across NHS E datasets (esp. in the absence of a NHS Number)
Demographics from across all NPD tables vs just 1 [spring census] table
Increased data coverage (period) and expanded cohort:
Period: 1984-09-01 to 2023-03-31 vs 1995-09-01 to 2020-03-31
Cohort: Born 1984-09-01 to 2023-03-31 vs 1995-09-01 to 2020-03-31
Case ascertainment and bias within groups (e.g. more recent birth cohorts, children with Downās syndrome)
How we wish to do this?
Building on GUILD:
using a gold standard dataset to quantify false matches and missed matches [unavailable]
comparing characteristics of linked and unlinked data to identify potential sources of bias
using sensitivity analyses to evaluate how sensitive results are to changes in linkage procedure.
Also:
Ecological level comparisons: comparison with small area census counts by demographic characteristics.
What we require
Linkage sensitivity analyses:
Information on the strength of a match (deterministic only) āļø
NHS Englandās change to internal linkage method (MPS)
Representative dataset with both old and new (MPS) linkage method applied āļø
For eval of impact of using identifiers from all NPD tables & impact of increased data coverage and expanded cohort:
Information on the distribution of identifiers across NPD tables and their individual strength of a match / non-match ā [TBC]
Wider Questions
What linkage (meta)data is currently made available by different state-backed linkage providers?
Beyond GUILD, what linkage (meta)data should routinely be made available by state-backed linkage providers?
Merge-split issues are likely to impact many routine data analyses (whether linkage is implicit or explicit). What metadata should we request from all data providers to evaluate this?