|
| | | |
Next Generation Linkage Management System
Harris, J.
SA NT Datalink is a consortium of government departments, universities and other parties that are committed to providing high quality data linkage to support research. The backroom technologies that provide linked data to researchers will be discussed in detail in this paper. Data linkage is commonly known as a process utilising computer data matching technology to compare similar records from within and across multiple datasets. The Next Generation Linkage Management System has been developed using open source technologies to manage disparate source data files coming in to the organisation, cleansing and standardisation, then the analysis of the data which will determine blocking parameters and linkage weights. The open source linkage engine called FEBRL (Freely Extensible Biomedical Record Linkage) is used to link the datasets using probabilistic methods. For storage of the linked records SA NT Datalink has employed a graph database which allows us to keep and reuse the rich comparison vectors. The data structures within a graph database are more aligned with the native formats of linked data. The graph database also provides a repository that is very fast for the retrieval of data, as unlike relational database there are no indexes or joins which are computationally expensive. The benefits of using both deterministic and probabilistic linkages will be discussed, and the analysis that is required on a dataset to assist in selecting the best linkage strategy. Graph databases are based on graph theory, and are used by some of the largest organisations on the web to deliver a very fast service to their customers. Some quality tools have been implemented by SA NT Datalink to ensure a reduction in the number of false positives and false negatives. Some mention will be given to what the Next Generation Linkage Management System does not provide will be touched upon. SA NT Datalink have developed a loosely coupled, open source system of managing, linking and extracting the linked data which will form the corner stone of their offerings to researchers for the coming decade. |
Cite as: Harris, J. (2013). Next Generation Linkage Management System. In Proc. Health Informatics and Knowledge Management 2013 (HIKM 2013) Adelaide, Australia. CRPIT, 142. Gray, K. and Koronios, A. Eds., ACS. 7-14 |
(from crpit.com)
(local if available)
|
|