Manually removing the identity of individuals from large medical record databases is prohibitively expensive, time-consuming and prone to error. So an automatic way to perform such operations would certainly be of great benefit.
This is now a reality, thanks to Gari Clifford, a research scientist at MIT’s Laboratory for Computational Physiology and his team, who have developed a computer program capable of deleting details from medical records which may identify patients, while leaving important medical information intact for research purposes.
The software successfully deleted more than 94 per cent of confidential information, while wrongly deleting only 0.2 per cent of the useful content, when tested on a database of over 1,800 nursing notes (a total of 296,400 words).
This result, according to the authors, ‘is significantly better than one expert working alone, at least as good as two trained medical professionals checking each other’s work and many, many times faster than either.’
The MIT team is freely providing the labelled de-identified data together with the software to allow others to improve their systems, and to allow the software to be adapted to other data types that may exhibit different qualities.