?:abstract
|
-
BACKGROUND The Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) defined by the non-profit organization, Observational Health Data Sciences and Informatics (OHDSI), has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. While analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. OBJECTIVE This study proposes and evaluates a de-identification strategy, which comprises several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The evaluation of the proposed strategy has been performed using the actual CDM database. METHODS This study proposes and evaluates a de-identification strategy, which comprises several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The evaluation of the proposed strategy has been performed using the actual CDM database. RESULTS The CDM database, which was constructed according to the rules established by OHDSI, exhibited a low re-identification risk: the highest re-identifiable record rate (\'11.3%\') in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one \'highest risk\' value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the \'source values\' (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models, but also the overall possibility of re-identification. CONCLUSIONS Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers. CLINICALTRIAL
|