Import Formats
The Import Format provides a description of the layout of the data files to be received for a Data Source. Each Data Source has one import format associated with it, and it is assumed that all files/data received for a single Data Source be in the same format. A Data Source's import format can be modified if the format changes over time.
The same Import Format can be used for different event types from the same or different Data Providers, provided the file layout is identical. Typically, new Import Formats are required for each new Event Type.
Import formats can be edited, but never deleted from the system.
Import Format Types
LinXmart currently accepts text files only, containing data in either delimited or fixed-width format. LinXmart expects all datasets to contain one record per row, and with each row to have a field that is a unique identifier. LinXmart will also accept ingesting data from SQL tables or views. For these data source types, the format must be specified by index (or ordinal value) or name of the column.
By Index (Delimited) Format
A delimited file is one in which the fields in each record (such as name, address, date of birth, id, etc.) are separated by a single character, such as a comma or a tab.
A row of a comma delimited data file.
When configuring rows, each field has a specific Index
value that determines which column it refers to. For delimited data files, such as the one above, this is simply the index of the column specified by the delimiters. For SQL tables, this corresponds to the Ordinal value of the column in the table.
Use this for SQL tables if you want to specify the column by its ordinal position.
By Name (Delimited) Format
Delimited files and SQL tables can also use this this setting to determine column mappings. Delimited files must have a header row for this to work.
A row of a comma delimited data file with a header row.
For SQL tables, the system will match the name of the Import Format Column with the name of the SQL table column.
Use this for SQL tables if you want to specify the column by column name.
Fixed Width Format
A fixed width data file is one in which each field is determined by a start and end position in the record. In these files, a field is given a set number of characters (e.g. 20), and for values that are less than 20 characters in length, the remainder are filled with spaces. LinXmart removes these spaces before processing the field.
Two rows from a fixed width file.
Creating an Import Format
Import formats can only be created while adding or editing a Data Source by clicking Add
next to the Import Format
field.
The following fields must be completed when creating a new Import Format through the Web Interface.
Field | Description |
---|---|
Name | Each Import Format must be given a unique name, and will thereafter appear in the drop-down list of previously created import formats. |
Column Positions | For delimited files or SQL tables, select By Index or By Name . Alternatively, select Fixed Width as explained above. |
Column Validation | Gives you the option to validate the existence of all columns in the data source, or to ignore those that are not found. |
Click on Save
and you will be prompted to add columns to this Import Format.
Adding Columns to an Import Format
A description of each column in the input data that is to be stored in LinXmart is required. A column can be added by selecting the Add New Column for this Import Format
button.
The following fields will need to be completed:
Field | Description |
---|---|
Linkage Field | This informs LinXmart which of its linkage fields will correspond to this column. The drop-down list contains all of the linkage fields that are available. These are listed in Appendix A. |
Name | This field’s value will default to the Linkage Field name. The operator may override this by using a more representative name. This is particularly recommended for text fields, and binary fields, and will assist in meaningfully identifying these fields later. An operator-provided name is mandatory for non-linkage fields. |
Format | This field only appears for some linkage fields. For example, dates can be provided and parsed by the system in a myriad of formats. Below are some example date formats that can be entered. A full list of formats for day, month and year components are provided in Appendix B. Note that month formats are specified in uppercase. |
Additional fields for delimited files:
Field | Description |
---|---|
Index | The column number of the field in question. The first field in the file has an index of 1, the second field 2, etc. |
Additional fields for fixed width files:
Field | Description |
---|---|
Start Position | The column start position of the field in question. |
End Position | The column end position of the field in question – the position of the last character. |
Most linkage fields have a defined maximum length. LinXmart will notify during processing if the maximum is exceeded and a record was not able to be parsed. When all fields (columns) have been added, click the Save
button above the list to save the Import Format.
Not all fields in the supplied data file need to be added to the Import Format. Any that are not included as either linkage or non-linkage fields will be ignored.
Field Types
An important LinXmart field is the Source Unique ID
. It is a field in the dataset that contains a unique value for each record in the file. All Import Formats require a Source Unique ID field. It is used by LinXmart to uniquely identify records, and is returned when data is extracted from the system.
If date of birth is supplied as a single field, this can be added as Date of Birth
. If it is supplied as components (day, month and year of birth) these can be added separately. However, only full date of birth or the separate components can be added – not both.
The Jurisdictional Linkage Key
refers to any field in the dataset that is a unique person identifier (that is, has the same value across different records within the collection belonging to the same person).
In addition to the named personally identifying fields, there are some additional fields (Text Fields 1-15) that can be used to store any other identifiers for use in linkage.
Non-linkage fields can also be stored in LinXmart. These are not used by LinXmart, but are there for the operator’s reference. To add these, select the value (none)
for the Linkage Field.
Finally, there also exists a number of Binary Fields. These are used for storing encoded fields in privacy preserving linkage. For more information on privacy preserving linkage, see Section 9.
Example dates
Example date | Format code |
---|---|
11/08/1986 | dd/MM/yyyy |
19860811 | yyyyMMdd |
1181986 or 11101986 | ddMyyyy |
11-AUG-1986 | dd-MMM-yyyy |
11-08-86 | dd-MM-yy |
Editing an Import Format
To edit an import format, you must first select to Edit an existing event type that uses the Import Format. From here, the Import Format can be edited by clicking Edit
next to the Import Format listed. The Row Format
is the only Import Format metadata field that cannot be changed. Any Import Format Columns (i.e. fields) entries can be changed or deleted, and new fields can be added.