Medical Physics and Informatics
Review
Beyond the DICOM Header: Additional Issues in Deidentification
OBJECTIVE. As the use of medical images in applications other than direct patient care increases, the need for deidentified images grows. Federal regulations govern the requirements for deidentification, and software developers offer several methods for deidentification.
CONCLUSION. However, there are numerous ways for protected health information to be included in images other than in DICOM headers. Either such information must be obscured or the images containing the information must be deleted to comply with deidentification requirements.
Keywords: deidentification, DICOM, HIPAA
Advances in accessibility, increases in opportunity for publication, and increased governmental and public oversight combine to increase the demand for deidentified medical images, particularly radiologic images. This article summarizes the issues, technology, and pitfalls involved in the deidentification of DICOM medical images.
A full review of the HIPAA [1] and Health Information Technology for Economic and Clinical Health Act [2] is beyond the scope of this article. However, it is important to briefly differentiate between privacy and security. According to the U.S. Department of Health and Human Services [3]: “[The] HIPAA Privacy Rule provides federal protections for individually identifiable health information … The Security Rule specifies a series of administrative, physical, and technical safeguards … to assure the confidentiality, integrity, and availability of electronic protected health information.”
The issue of deidentification relates to privacy, not security, because once properly deidentified, the remaining health information is no longer protected (section 164.502(d) of the Privacy Rule) [4]. The Privacy Rule allows two methods of deidentification, the Expert Determination method and the Safe Harbor method. The Expert Determination method requires an expert to determine and document that sufficient statistical methods have been used to reduce the risk of identification of an individual subject to a “very small” level. The Safe Harbor method satisfies the Privacy Rule by the removal of an enumerated list of data elements (Appendix 1).
The electronic exchange of radiologic images occurs frequently and for numerous reasons, from education and research to commercial and public uses. Because the overwhelming majority of our imaging data follow the DICOM standard, a method of deidentifying DICOM images would satisfy practically all of the needs of radiologists, excepting only a few non-DICOM images and nonimaging information, such as reports. Because the data elements are explicit and predictable, we do not need to use statistical methods to deidentify our data, but should use the Safe Harbor method. A review of the complete list of the Safe Harbor deidentification data elements yields a relatively small subset that occurs in DICOM headers [3] (Appendix 1). These elements form the focus of our efforts.
The DICOM standard has been remarkably successful in allowing communication among virtually all radiologic imaging devices. The standard is bewilderingly complex [5], containing 20 parts, and committees are working on the development of other parts to be included in the future. Currently, there are a total of 3307 unique DICOM tags [6], each of which has a numeric identifier and a text name; for example, (0008,0012) corresponds to “Instance Creation Date.” Only some parts of the standard apply to any given device, and manufacturers have considerable leeway in adopting all the practices described in the standard and still are allowed to market their devices as “DICOM compliant.” As a result, each device needs a DICOM conformance statement that outlines how that particular device meets the applicable parts of the standard. As this relates to DICOM tags, an examination from any given machine will have a common collection of “core” tags, such as (0010,0010) “PatientsName” and (0008,0060) “Modality,” but may or may not use less common tags such as (0008,103e) “SeriesDescription” or (0018,1023) “DigitalImageFormatAcquired.”
Several recent publications address deidentification of radiologic images. Freymann et al. [7] provide an excellent technical description of their efforts to develop an open-source deidentification application for multicenter research studies. Clark et al. [8] outline the National Lung Screening Trial, which collected and deidentified 48,000 chest CT scans. Their deidentification process is not completely described, but it includes alteration of the medical center identification number and patient identification number and deletion of the patient's name, date of birth, sex, accession number, and date of examination. An individual patient's examinations were identified only as initial screening and first and second follow-up studies. Bland et al. [9] describe a method for transferring deidentified nonim-aging medical research data in a manner that preserves its relationship to the imaging data. Onken et al. [10] describe a system in which a unique anonymous token is issued in place of protected health information (PHI), allowing the reassociation of PHI with images if a properly authenticated user has such a need.
There are several noncommercial deidentification applications available online. For example, DicomBrowser (version 1.5.2, Washington University Neuroinformatics Research Group) [11] allows a user to change individual DICOM tags at the image, series, examination, or patient level. The website also offers sample deidentification scripts, one of which may remove so many tags that the resulting studies may not be useful and another that is intended as a base from which the user can complete a deidentification algorithm.
Many PACS vendors also offer deidentification as a feature. Centricity 3.0 (GE Healthcare) provides the ability to deidentify images at time of export. The deidentified images are exported as JPEG, portable network graphic, or TIFF (but not DICOM) image files. OsiriX 5.6 (Pixmeo) offers flexible deidentification, although the user must participate in the decision making. The default deidentification removes the patient's name, medical record number (MRN), age, sex, and weight. However, the user may select any DICOM tags to add to the deidentification algorithm and save the resulting group of tags as a custom script. This requires the user to have some understanding of DICOM definitions and regulatory requirements to be compliant. Cleome Workstation 10.0 (ClearCanvas) provides the capability to deidentify any examination using a fixed algorithm. In addition to deleting certain tags, this application opens a dialog box allowing the user to enter replacement values for several others (Fig. 1). As a convenience, the algorithm “randomizes” the date of birth to another day in the same calendar year, which helps preserve information pertaining to the age of a patient at the time of an examination. As an example, Table 1 lists the DICOM headers for a knee MRI of the author along with the same header data after deidentification performed by two PACS deidentification algorithms.
![]() View larger version (225K) | Fig. 1 —Cleome Workstation (ClearCanvas) “Anonymize Study” dialog box. (Reprinted with permission from ClearCanvas) |
Once the DICOM headers have been cleared of PHI, the images themselves require attention, because many imaging studies contain identifying information in locations other than the DICOM header. This nonstandard documentation must be found and deleted manually, which can be time consuming and incomplete. The following types of nonstandard documentation have been encountered over the course of deidentification of hundreds of examinations from a variety of sources.
Most images are identified only by the DICOM overlay, and when the header information is deidentified, the images become deidentified. Images that are saved via screen capture (e.g., dose reports, bolus tracking, and contrast agent information) will have intrinsic patient information (Fig. 2).
![]() View larger version (77K) | Fig. 2 —Typical CT dose report image. |
Many centers scan requests, screening forms, consent forms, technologist worksheets, and other papers into the PACS to keep information localized for convenient retrieval by the interpreting radiologist. These images are not interpreted as text by any software and so cannot be identified in an automated fashion.
All ultrasound machines store the patient's name and MRN as intrinsic parts of the image and are displayed whenever the image is displayed (Fig. 3). The DICOM headers can be deidentified, but the PHI in the image remains. In addition to the patient's name and MRN, some formats display the birth date, scan date and time, or facility name.
![]() View larger version (163K) | Fig. 3 —Typical ultrasound image with patient name and medical record number shown on top line. |
Like ultrasound, most nuclear imaging systems include the patient's name and other identifiers in clinical images. Raw data (such as the projectional images used in SPECT) may not have PHI included, but postprocessed images do. Heavily formatted postprocessed examinations, such as cardiac imaging and PET, may have PHI in several locations on an image (Fig. 4).
![]() View larger version (74K) | Fig. 4A —Gallbladder emptying study. A, Raw data from gamma camera do not contain protected health information. |
![]() View larger version (186K) | Fig. 4B —Gallbladder emptying study. B, However, postprocessed data contain patient's name, identification number, and date of birth as part of image. |
![]() View larger version (226K) | Fig. 4C —Gallbladder emptying study. C, Gated SPECT wall motion analysis has patient's name in two places (arrows), as well as patient's identification number. |
Most C-arm units include identifying information in their screen-captured images (Fig. 5).
![]() View larger version (263K) | Fig. 5 —Intraoperative fluoroscopic spot image contains patient's name and handwritten check-in number. |
Some workstations include DICOM overlays in all postprocessed images. These images will be encountered when dealing with cardiac CT, PET/CT, or PET-MRI fusion images or many 3D imaging series (Fig. 6).
![]() View larger version (222K) | Fig. 6A —Advanced imaging workstations typically include protected health information in images. A, Postprocessed coronary CT angiogram with patient's name and medical record number. |
![]() View larger version (159K) | Fig. 6B —Advanced imaging workstations typically include protected health information in images. B, Postprocessed PET/CT fusion image containing patient's name, medical record number, and date of birth. |
The output images from some mammography computer-aided detection systems record patient identifiers in the image (Fig. 7).
![]() View larger version (137K) | Fig. 7 —Computer-aided detection software for mammography includes patient's name, identification number, and date of birth on image. |
When faced with nonstandard format PHI, the user must balance the need for the images against the cost of manually deidentifying the images. Scanned documents should probably be deleted. PHI embedded in images can be obscured in a number of ways. If the images are to be used in a format other than DICOM, any photoediting software can overlay a black box on the PHI to be removed. If the users’ intent is to maintain the DICOM format, Photoshop (Adobe Systems) is able to both import and export DICOM images, although there may be unpredictable loss of some DICOM tags. The OsiriX region of interest feature provides a user with the ability to cover any desired data from all images of a series within the PACS itself [12]. Most of these methods use editing layers for the obscuring box; thus, be sure to compress the layers out of any image you save, or a subsequent user could reedit the image, remove the obscuring layer, and reveal the PHI.
Although adherence to the Safe Harbor requirements will ensure compliance with HIPAA, there are other issues around deidentification. Although it is not required to be removed by HIPAA, knowledge of the brand and model of an imaging device (tags (0008,0070) and (0008,1090)) may narrow down the possible sites of origin of a scan, and, if used, the device's serial number (tag (0018,1000)) could conclusively pin down a particular scanner. Other more clinical tags such as Referring Physician Name (tag (0008,0090)) could be used to deduce a patient's identity, if desired.
Finally, some disease is so unique that it could be considered identifying information. In that case, even stripping every numeric and textual element from the image would not succeed in deidentifying it.
Deidentifying radiology images requires care and diligence. Automated software solutions can perform the vast majority of the deidentification, but the user must be aware of the many ways in which PHI can be included in images themselves and take appropriate action to obscure such PHI, or delete the images themselves from the deidentified examination. In addition, although HIPAA compliance represents the national standard for deidentification, users should be aware that data elements not required to be removed under HIPAA could ultimately be used to deduce a patient's identity.
WEB
This is a web exclusive article.
| APPENDIX 1: Data Elements Required for Removal by Safe Harbor Method |
|---|
(A) Names.
(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:
(1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and
(2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000.
(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.
(D) Telephone numbers.
(E) Fax numbers.
(F) Email addresses.
(G) Social Security numbers.
(H) Medical record numbers.
(I) Health plan beneficiary numbers.
(J) Account numbers.
(L) Vehicle identifiers and serial numbers, including license plate numbers.
(M) Device identifiers and serial numbers.
(N) Web Universal Resource Locators (URLs).
(O) Internet Protocol (IP) addresses.
(P) Biometric identifiers, including finger and voice prints.
(Q) Full-face photographs and any comparable images.
(R) Any other unique identifying number, characteristic, or code, except as permitted by paragraph (C) of this section.

Audio Available | 









