Fellow Fox Rothschild LLP Partner (and former hospital system General Counsel) Salvatore J.  Russo generously contributed this post.

Some twenty-three years ago, the first well-publicized incident of the re-identification of de-identified personal health data was brought to the attention of the American public. It involved the then governor of Massachusetts, William Weld.   Dr. Latanya Sweeney a graduate student from MIT successfully combined de-identified data with the publicly available Cambridge voter registration list, and successfully translated de-identifiable data into identifiable data using privacy technology, and identified the Governor’s health records, including diagnosis and prescriptions.

In 2008, after Netflix publicly released movie rating records, two researchers from the University of Texas, Arvind Narayanan and Vitaly Shmatikov, matched the released data with the Internet Movie Database and successfully re-identified the users. In 2018, using publicly available Amazon review data, a group from MIT re-identified persons from the Netflix dataset.

Finally, it was reported in the December 2018 issue of the Journal of the American Medical Association that researchers from the United States and China collaborated on a project to re-identify individuals from a national de-identified physical activity data set.  Using an algorithm employed in machine learning to pair daily patterns in physical activity data with corresponding demographic data, they were fairly successful in de-anonymizing the information.

HIPAA seeks, among other things, to protect the privacy of health information by de-identification. The HIPAA gold standard for de-identification for protected health information is achieved by one of two means.   De-identification can result from the stripping of the 18 types of identifiers from protected health information.   Alternatively, it can be accomplished by expert determination that there is a very small risk of identification. This approach must be reconsidered. Moreover, HIPAA only governs “covered entities,” and not the vast array of business enterprises that possess private health information.

It is the development of big data and advances in artificial intelligence that are truly the game changers in discussion of de-identification and privacy.  These two forces create a major concern for safeguarding private health information where sophisticated companies with large repositories of big data combine with health care systems with the goal of improving medical care.

The dilemma that the regulators must confront going forward, particularly in the context of personal health data, is how to strike a balance between providing adequate privacy protections without imposing unnecessary barriers to the medical advancements that result from the workings of AI and big data?

Society must engage in a policy cost-benefit-risk type analysis to inform our conversation.  Risk tolerance is the pivotal judgment that needs to be undertaken to assess what cost does society wish to pay to protect privacy.

In view of the rapid advances in AI, and the continuing amassing of personal health data, absolute personal health privacy protection may be elusive while we seek the medical benefits obtained from the intersection of AI with big data.  Maybe certainty and absolute guarantees should not be the goal.  However, we must strive for a standard that we can live with that reasonably protects our personal health information from unconsented disclosure.