irkaljl5 | Knowledge4COVID-19

Property	Value
?:abstract	BACKGROUND: There has been growing interest in data synthesis for enabling the sharing of data for secondary analysis; however, there is a need for a comprehensive privacy risk model for fully synthetic data: If the generative models have been overfit, then it is possible to identify individuals from synthetic data and learn something new about them. OBJECTIVE: The purpose of this study is to develop and apply a methodology for evaluating the identity disclosure risks of fully synthetic data. METHODS: A full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person. We term this \'meaningful identity disclosure risk.\' The model is applied on samples from the Washington State Hospital discharge database (2007) and the Canadian COVID-19 cases database. Both of these datasets were synthesized using a sequential decision tree process commonly used to synthesize health and social science data. RESULTS: The meaningful identity disclosure risk for both of these synthesized samples was below the commonly used 0.09 risk threshold (0.0198 and 0.0086, respectively), and 4 times and 5 times lower than the risk values for the original datasets, respectively. CONCLUSIONS: We have presented a comprehensive identity disclosure risk model for fully synthetic data. The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably. The risk model can be applied in the future to evaluate the privacy of fully synthetic data.
is ?:annotates of	<https://research.tib.eu/covid-19/entity/irkaljl5_hasAnnotation_C0027627>
?:creator	<https://research.tib.eu/covid-19/entity/El_Emam%2C_Khaled%3B_Mosquera%2C_Lucy%3B_Bass%2C_Jason>
?:journal	J_Med_Internet_Res
?:license	unk
?:publication_isRelatedTo_Disease	<https://research.tib.eu/covid-19/entity/COVID-19>
?:source	WHO
?:title	Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation
?:type	<https://research.tib.eu/covid-19/vocab/Publication>
?:who_covidence_id	#945539
?:year	2020

Metadata

Anon_0

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://purl.org/net/provenance/ns#DataItem>

<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>

<http://www.w3.org/2004/03/trix/rdfg-1/Graph>

<http://xmlns.com/foaf/0.1/primaryTopic>

<https://research.tib.eu/covid-19/entity/irkaljl5>

<http://xmlns.com/foaf/0.1/topic>

Anon_0

<http://www.ontologydesignpatterns.org/cp/owl/informationrealization.owl#realizes>

<https://research.tib.eu/covid-19/data/entity/irkaljl5>

<http://purl.org/net/provenance/ns#createdBy>

Anon_1 (more)

expand all