Extracting linkage keys
LinXmart is designed to provide linkage keys identifying records belonging to the same person within and across datasets.
Data extractions in LinXmart occur within each Linkage Project. Before a data extraction can occur, an Extraction Project must be created. An Extraction Project can be used to represent a snapshot in time of the linkage map for the Linkage Project, or it can be used as a mechanism for obtaining linkage keys for the Linkage Project that endure over time. A Linkage Project may have any number of snapshot-based Extraction Projects representing the linkage map at different points in time.
Snapshot Extraction Projects
This is the traditional type of Extraction Project used by LinXmart. Data that is extracted from this type of Extraction Project will always represent a snapshot of the linkage map at a particular point in time. This point in time is the date and time that the Extraction Project is created. Extracted linkage keys are masked for each Extraction Project in order to prevent data from different Extraction Projects being joined/merged, which could potentially allow users of the linked data to gain more information that they have approval for. The masking used for linkage keys in each Extraction Project is unique to that Extraction Project, and the linkage keys will never match those in another Extraction Project. However, the masking within an Extraction Project is the same for all records across all event types, allowing researchers to match encrypted person identifiers within and across event types.
Record ID | Group ID in LinXmart | Masked Linkage Key for Extraction Project 1 | Masked Linkage Key for Extraction Project 2 |
---|---|---|---|
Record001 | 1 | 50E01D95563D | CFB1A0394AB15 |
Record002 | 1 | 50E01D95563D | CFB1A0394AB15 |
Record003 | 2 | 3092132E3128 | FFA10A0A679E1 |
Record004 | 3 | 2586F42C23A3 | 0F20EEC0C91C2 |
Linkage keys are masked between Extraction Projects, to prevent the potential pooling of data by users
Data extractions occur at a point in time, the data at which the Extraction Project was created, regardless of the data that has been added after this point. As the linkage map (the relationship between records and linkage keys) can change at any time due to new records being ingested or quality reviews being performed, it is crucial that the linkage keys extracted 'match up' and inconsistencies are not formed due to the dynamic nature of the linkage map. LinXmart avoids this issue by extracting data at a precisely defined point in time – specifically the point in time at which the Extraction Project was created. If new data has been added or quality review changes have been made, these are effectively ignored, and the linkage keys are extracted as they looked at that particular point in time.
Do not create an Extraction Project until all of the data has been added and linked!
Enduring Keys Extraction Projects
An Extraction Project can be created with the ability to produce enduring linkage keys. These linkage keys are tied to the Linkage Project rather than Extraction Project, so two "enduring key" Extraction Projects in the same Linkage Project will produce the same linkage keys. Additionally, these types of Extraction Projects are not confined to a point-in-time snapshot of the linkage map - extractions of linkage keys from these projects will always use the latest version of the linkage map.
Extractions of these types will also add the enduring linkage keys of records from linked Linkage Projects.
Adding an Extraction Project
Snapshot Extraction Projects are point-in-time snapshot of the linkage map belonging to a specific Linkage Project - these should only be created once the data required for the extraction has completed linking. Enduring Keys Extraction Projects can be created at any time.
Extraction Projects are created on the 'Project Details' page by clicking Add Extraction Project
from the Extraction Projects pane at the bottom of the page.
The following fields are required:
Field | Description |
---|---|
Reference Id | This is a user-defined (and unique) reference ID given to the Project by the Operator. |
Description | This describes the research project or administrative/business purpose for which this extraction is being made. |
Enduring Linkage Keys | Specifies whether a linkage map snapshot is used, or enduring linkage keys and always the latest linkage map. |
Click Save
to create the new Extraction Project. The Extraction Project Detail screen in then shown.
Once created Extraction Projects cannot be edited or deleted.
Requesting a data extraction
An extraction can be requested for an Extraction Project by clicking on the Request New Data Extraction
button/link on the Extraction Project Details page. This presents the user with a list of all Event Types used by the project. Data is extracted per Event Type.
Check the box next to each Event Type that requires extraction and click the Request
button. A Job will be created for each Event Type to gather the required data and create the extraction.
Extraction output
Once the extraction for an Event Type is completed, the results will be available to download from the Extraction Project Details page. Click on the download icon in the Options column for the desired Extraction Result. Each requested Event Type will be available separately.
The format of the downloaded linkage map is a LinXmart Envelope. It is a zip file that contains two files:
- A manifest file,
manifest.xml
, that describes what the data is, with references to the Linkage Project, Data Provider and Event Type. - The data file,
data-extract.csv
, a delimited text file that represents the linkage map for a specific Event Type.
The data file is in comma delimited format with a number of fields:
Field | Description |
---|---|
ProjectCode | The Code of the Linkage Project that the Extraction Request is a part of. |
SourceUniqueID | The original record identifier supplied on the records when ingested into the system. |
SourcePersonId | If a jurisdictional linkage key (i.e. a dataset specific person identifier) was provided on the dataset, this field is also output here. If not, this field is left blank. |
PersonKey | The linkage key for the record, specifically for this Extraction Project. Records with the same PersonKey belong to the same person; records with different PersonKeys belong to different people. |
LinkedProjectPersonKey | If the record is from a linked project, this will be the enduring linkage key for the record from that other Linkage Project. |