December 2014, VOLUME 203
NUMBER 6

Recommend & Share

December 2014, Volume 203, Number 6

Medical Physics and Informatics

Review

Beyond the DICOM Header: Additional Issues in Deidentification

+ Affiliation:
1Department of Radiology, University of Washington, Harborview Medical Center, 325 9th Ave, Box 359728, Seattle, WA 98104.

Citation: American Journal of Roentgenology. 2014;203: W658-W664. 10.2214/AJR.13.11789

ABSTRACT
Next section

OBJECTIVE. As the use of medical images in applications other than direct patient care increases, the need for deidentified images grows. Federal regulations govern the requirements for deidentification, and software developers offer several methods for deidentification.

CONCLUSION. However, there are numerous ways for protected health information to be included in images other than in DICOM headers. Either such information must be obscured or the images containing the information must be deleted to comply with deidentification requirements.

Keywords: deidentification, DICOM, HIPAA

Advances in accessibility, increases in opportunity for publication, and increased governmental and public oversight combine to increase the demand for deidentified medical images, particularly radiologic images. This article summarizes the issues, technology, and pitfalls involved in the deidentification of DICOM medical images.

Privacy Requirements
Previous sectionNext section

A full review of the HIPAA [1] and Health Information Technology for Economic and Clinical Health Act [2] is beyond the scope of this article. However, it is important to briefly differentiate between privacy and security. According to the U.S. Department of Health and Human Services [3]: “[The] HIPAA Privacy Rule provides federal protections for individually identifiable health information … The Security Rule specifies a series of administrative, physical, and technical safeguards … to assure the confidentiality, integrity, and availability of electronic protected health information.”

The issue of deidentification relates to privacy, not security, because once properly deidentified, the remaining health information is no longer protected (section 164.502(d) of the Privacy Rule) [4]. The Privacy Rule allows two methods of deidentification, the Expert Determination method and the Safe Harbor method. The Expert Determination method requires an expert to determine and document that sufficient statistical methods have been used to reduce the risk of identification of an individual subject to a “very small” level. The Safe Harbor method satisfies the Privacy Rule by the removal of an enumerated list of data elements (Appendix 1).

Application of the Privacy Rule to Radiologic Images
Previous sectionNext section

The electronic exchange of radiologic images occurs frequently and for numerous reasons, from education and research to commercial and public uses. Because the overwhelming majority of our imaging data follow the DICOM standard, a method of deidentifying DICOM images would satisfy practically all of the needs of radiologists, excepting only a few non-DICOM images and nonimaging information, such as reports. Because the data elements are explicit and predictable, we do not need to use statistical methods to deidentify our data, but should use the Safe Harbor method. A review of the complete list of the Safe Harbor deidentification data elements yields a relatively small subset that occurs in DICOM headers [3] (Appendix 1). These elements form the focus of our efforts.

DICOM
Previous sectionNext section

The DICOM standard has been remarkably successful in allowing communication among virtually all radiologic imaging devices. The standard is bewilderingly complex [5], containing 20 parts, and committees are working on the development of other parts to be included in the future. Currently, there are a total of 3307 unique DICOM tags [6], each of which has a numeric identifier and a text name; for example, (0008,0012) corresponds to “Instance Creation Date.” Only some parts of the standard apply to any given device, and manufacturers have considerable leeway in adopting all the practices described in the standard and still are allowed to market their devices as “DICOM compliant.” As a result, each device needs a DICOM conformance statement that outlines how that particular device meets the applicable parts of the standard. As this relates to DICOM tags, an examination from any given machine will have a common collection of “core” tags, such as (0010,0010) “PatientsName” and (0008,0060) “Modality,” but may or may not use less common tags such as (0008,103e) “SeriesDescription” or (0018,1023) “DigitalImageFormatAcquired.”

Review of Current Literature
Previous sectionNext section

Several recent publications address deidentification of radiologic images. Freymann et al. [7] provide an excellent technical description of their efforts to develop an open-source deidentification application for multicenter research studies. Clark et al. [8] outline the National Lung Screening Trial, which collected and deidentified 48,000 chest CT scans. Their deidentification process is not completely described, but it includes alteration of the medical center identification number and patient identification number and deletion of the patient's name, date of birth, sex, accession number, and date of examination. An individual patient's examinations were identified only as initial screening and first and second follow-up studies. Bland et al. [9] describe a method for transferring deidentified nonim-aging medical research data in a manner that preserves its relationship to the imaging data. Onken et al. [10] describe a system in which a unique anonymous token is issued in place of protected health information (PHI), allowing the reassociation of PHI with images if a properly authenticated user has such a need.

Deidentification Capabilities of Selected Applications
Previous sectionNext section

There are several noncommercial deidentification applications available online. For example, DicomBrowser (version 1.5.2, Washington University Neuroinformatics Research Group) [11] allows a user to change individual DICOM tags at the image, series, examination, or patient level. The website also offers sample deidentification scripts, one of which may remove so many tags that the resulting studies may not be useful and another that is intended as a base from which the user can complete a deidentification algorithm.

Many PACS vendors also offer deidentification as a feature. Centricity 3.0 (GE Healthcare) provides the ability to deidentify images at time of export. The deidentified images are exported as JPEG, portable network graphic, or TIFF (but not DICOM) image files. OsiriX 5.6 (Pixmeo) offers flexible deidentification, although the user must participate in the decision making. The default deidentification removes the patient's name, medical record number (MRN), age, sex, and weight. However, the user may select any DICOM tags to add to the deidentification algorithm and save the resulting group of tags as a custom script. This requires the user to have some understanding of DICOM definitions and regulatory requirements to be compliant. Cleome Workstation 10.0 (ClearCanvas) provides the capability to deidentify any examination using a fixed algorithm. In addition to deleting certain tags, this application opens a dialog box allowing the user to enter replacement values for several others (Fig. 1). As a convenience, the algorithm “randomizes” the date of birth to another day in the same calendar year, which helps preserve information pertaining to the age of a patient at the time of an examination. As an example, Table 1 lists the DICOM headers for a knee MRI of the author along with the same header data after deidentification performed by two PACS deidentification algorithms.

figure
View larger version (225K)

Fig. 1 —Cleome Workstation (ClearCanvas) “Anonymize Study” dialog box. (Reprinted with permission from ClearCanvas)

TABLE 1: Selected DICOM Header Data From the Author's Knee MRI
Additional Deidentification
Previous sectionNext section

Once the DICOM headers have been cleared of PHI, the images themselves require attention, because many imaging studies contain identifying information in locations other than the DICOM header. This nonstandard documentation must be found and deleted manually, which can be time consuming and incomplete. The following types of nonstandard documentation have been encountered over the course of deidentification of hundreds of examinations from a variety of sources.

CT

Most images are identified only by the DICOM overlay, and when the header information is deidentified, the images become deidentified. Images that are saved via screen capture (e.g., dose reports, bolus tracking, and contrast agent information) will have intrinsic patient information (Fig. 2).

figure
View larger version (77K)

Fig. 2 —Typical CT dose report image.

Scanned Documents

Many centers scan requests, screening forms, consent forms, technologist worksheets, and other papers into the PACS to keep information localized for convenient retrieval by the interpreting radiologist. These images are not interpreted as text by any software and so cannot be identified in an automated fashion.

Ultrasound

All ultrasound machines store the patient's name and MRN as intrinsic parts of the image and are displayed whenever the image is displayed (Fig. 3). The DICOM headers can be deidentified, but the PHI in the image remains. In addition to the patient's name and MRN, some formats display the birth date, scan date and time, or facility name.

figure
View larger version (163K)

Fig. 3 —Typical ultrasound image with patient name and medical record number shown on top line.

Nuclear Imaging

Like ultrasound, most nuclear imaging systems include the patient's name and other identifiers in clinical images. Raw data (such as the projectional images used in SPECT) may not have PHI included, but postprocessed images do. Heavily formatted postprocessed examinations, such as cardiac imaging and PET, may have PHI in several locations on an image (Fig. 4).

figure
View larger version (74K)

Fig. 4A —Gallbladder emptying study.

A, Raw data from gamma camera do not contain protected health information.

figure
View larger version (186K)

Fig. 4B —Gallbladder emptying study.

B, However, postprocessed data contain patient's name, identification number, and date of birth as part of image.

figure
View larger version (226K)

Fig. 4C —Gallbladder emptying study.

C, Gated SPECT wall motion analysis has patient's name in two places (arrows), as well as patient's identification number.

Portable Fluoroscopy

Most C-arm units include identifying information in their screen-captured images (Fig. 5).

figure
View larger version (263K)

Fig. 5 —Intraoperative fluoroscopic spot image contains patient's name and handwritten check-in number.

Independent Postprocessing Workstations

Some workstations include DICOM overlays in all postprocessed images. These images will be encountered when dealing with cardiac CT, PET/CT, or PET-MRI fusion images or many 3D imaging series (Fig. 6).

figure
View larger version (222K)

Fig. 6A —Advanced imaging workstations typically include protected health information in images.

A, Postprocessed coronary CT angiogram with patient's name and medical record number.

figure
View larger version (159K)

Fig. 6B —Advanced imaging workstations typically include protected health information in images.

B, Postprocessed PET/CT fusion image containing patient's name, medical record number, and date of birth.

Computer-Aided Detection

The output images from some mammography computer-aided detection systems record patient identifiers in the image (Fig. 7).

figure
View larger version (137K)

Fig. 7 —Computer-aided detection software for mammography includes patient's name, identification number, and date of birth on image.

Solutions
Previous sectionNext section

When faced with nonstandard format PHI, the user must balance the need for the images against the cost of manually deidentifying the images. Scanned documents should probably be deleted. PHI embedded in images can be obscured in a number of ways. If the images are to be used in a format other than DICOM, any photoediting software can overlay a black box on the PHI to be removed. If the users’ intent is to maintain the DICOM format, Photoshop (Adobe Systems) is able to both import and export DICOM images, although there may be unpredictable loss of some DICOM tags. The OsiriX region of interest feature provides a user with the ability to cover any desired data from all images of a series within the PACS itself [12]. Most of these methods use editing layers for the obscuring box; thus, be sure to compress the layers out of any image you save, or a subsequent user could reedit the image, remove the obscuring layer, and reveal the PHI.

Additional Considerations
Previous sectionNext section

Although adherence to the Safe Harbor requirements will ensure compliance with HIPAA, there are other issues around deidentification. Although it is not required to be removed by HIPAA, knowledge of the brand and model of an imaging device (tags (0008,0070) and (0008,1090)) may narrow down the possible sites of origin of a scan, and, if used, the device's serial number (tag (0018,1000)) could conclusively pin down a particular scanner. Other more clinical tags such as Referring Physician Name (tag (0008,0090)) could be used to deduce a patient's identity, if desired.

Finally, some disease is so unique that it could be considered identifying information. In that case, even stripping every numeric and textual element from the image would not succeed in deidentifying it.

Conclusion
Previous sectionNext section

Deidentifying radiology images requires care and diligence. Automated software solutions can perform the vast majority of the deidentification, but the user must be aware of the many ways in which PHI can be included in images themselves and take appropriate action to obscure such PHI, or delete the images themselves from the deidentified examination. In addition, although HIPAA compliance represents the national standard for deidentification, users should be aware that data elements not required to be removed under HIPAA could ultimately be used to deduce a patient's identity.

Acknowledgment
Previous sectionNext section

The author would like to thank Joel A. Gross, M.D. for his editorial assistance.

WEB

This is a web exclusive article.

References
Previous section
1. Health Insurance Portability And Accountability Act, Pub. L. 104-191, 110 Stat. 1936 (1996) [Google Scholar]
2. Health Information Technology for Economic and Clinical Health Act, Pub. L. 111-5, 123 Stat. 226 (2009) [Google Scholar]
3. U.S. Department of Health and Human Services. Understanding health information privacy. U.S. Department of Health and Human Services website. www.hhs.gov/ocr/privacy/hipaa/understanding/index.html. Accessed July 12, 2013 [Google Scholar]
4. U.S. Department of Health and Human Services. Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. U.S. Department of Health and Human Services website. www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html. Accessed July 12, 2013 [Google Scholar]
5. Medical Imaging & Technology Alliance. The DICOM standard. National Electrical Manufacturers Association website. medical.nema.org/standard.html. Published January 1, 2009. Updated August 11, 2011. Accessed July 12, 2013 [Google Scholar]
6. [No authors listed]. DICOM tag list. www.dicomtags.com/. Accessed July 12, 2013 [Google Scholar]
7. Freymann JB, Kirby JS, Perry JH, Clunie DA, Jaffe CC. Image data sharing for biomedical research: meeting HIPAA requirements for deidentification. J Digit Imaging 2012; 25:14–24 [Google Scholar]
8. Clark KW, Gierada DS, Marquez G, et al. Collecting 48,000 CT exams for the lung screening study of the National Lung Screening Trial. J Digit Imaging 2009; 22:667–680 [Google Scholar]
9. Bland PH, Laderach GE, Meyer CR. A web-based interface for communication of data between the clinical and research environments without revealing identifying information. Acad Radiol 2007; 14:757–764 [Google Scholar]
10. Onken M, Riesmeier J, Engel M, Yabanci A, Zabel B, Despres S. Reversible anonymization of DICOM images using automatically generated policies. Stud Health Technol Inform 2009; 150:861–865 [Google Scholar]
11. Washington University Neuroinformatics Research Group. DicomBrowser, version 1.5.2. Washington University website. nrg.wustl.edu/software/dicom-browser/. Published February 23, 2012. Accessed December 19, 2013 [Google Scholar]
12. [No authors listed]. Fixing ID with Osirix. www.youtube.com/watch?v=pcKBSvw68n0. Published October 7, 2009. Accessed December 18, 2013 [Google Scholar]
APPENDIX 1: Data Elements Required for Removal by Safe Harbor Method
  • (A) Names.

  • (B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:

    • (1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and

    • (2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000.

  • (C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.

  • (D) Telephone numbers.

  • (E) Fax numbers.

  • (F) Email addresses.

  • (G) Social Security numbers.

  • (H) Medical record numbers.

  • (I) Health plan beneficiary numbers.

  • (J) Account numbers.

  • (L) Vehicle identifiers and serial numbers, including license plate numbers.

  • (M) Device identifiers and serial numbers.

  • (N) Web Universal Resource Locators (URLs).

  • (O) Internet Protocol (IP) addresses.

  • (P) Biometric identifiers, including finger and voice prints.

  • (Q) Full-face photographs and any comparable images.

  • (R) Any other unique identifying number, characteristic, or code, except as permitted by paragraph (C) of this section.

Address correspondence to J. D. Robinson ().

Recommended Articles

Beyond the DICOM Header: Additional Issues in Deidentification

Full Access, , , , , ,
American Journal of Roentgenology. 2020;214:727-735. 10.2214/AJR.19.21958
Abstract | Full Text | PDF (1302 KB) | PDF Plus (1092 KB) 
Full Access, , , , , ,
American Journal of Roentgenology. 2014;203:1272-1279. 10.2214/AJR.13.12263
Abstract | Full Text | PDF (984 KB) | PDF Plus (1000 KB) 
Full Access,
American Journal of Roentgenology. 2014;203:1265-1271. 10.2214/AJR.14.12636
Abstract | Full Text | PDF (728 KB) | PDF Plus (828 KB) 
Full Access, , , , ,
American Journal of Roentgenology. 2014;203:1280-1285. 10.2214/AJR.13.11884
Abstract | Full Text | PDF (578 KB) | PDF Plus (604 KB) 
Full Access,
American Journal of Roentgenology. 2013;200:142-145. 10.2214/AJR.12.8501
Abstract | Full Text | PDF (680 KB) | PDF Plus (651 KB) 
Full Access, , , , , ,
American Journal of Roentgenology. 2014;203:1257-1264. 10.2214/AJR.13.12229
Abstract | Full Text | PDF (1111 KB) | PDF Plus (1168 KB) | Supplemental Material