DICOM De-Identification Pipeline for CRO

The University of Alabama at Birmingham

Software DevelopmentDICOM

The Problem

Clinical imaging data, often stored as DICOM files, is the fuel needed to train new machine learning models. But before clinical images can be used, any identifying information must be removed. This process is called DICOM de-identification. It is a challenging problem with many custom needs per institution.

The Outcome

We designed, developed, and have since maintained a custom DICOM de-identification pipeline for the institution.

The Solution

To provide their researchers easier access to clinical images, UAB’s Department of Radiology wanted to streamline its DICOM de-identification process. Their initial process involved manual steps that could be automated—speeding up the process while also avoiding human error.

UAB hired Innolitics to provide a DICOM de-identification solution.

Project Requirements

We worked with the Department of Radiology’s vice chair of clinical research to understand UAB’s requirements. They needed a solution that would:

  • Allow researchers to specify DICOM UID mappings with simple excel files.
  • Provide a database that allows DICOM files to be re-identified.
  • Export de-identified files to a research PACS or the filesystem.
  • Communicate with the Philips iSite PACS.
  • Throttle requests to the clinical PACS.
  • Support multiple simultaneous research projects.
  • Support scheduling de-identification tasks during off-hours.
  • Not require outside network access.
  • Be straightforward for IT to install (we used Docker Images).

Customized Solution

We examined the free DICOM de-identification tools available, and in particular RSNA’s Clinical Trial Processor. No existing tool met all of UAB’s needs, so built a tool using Python and the pydicom library.

Figure 1: DICOM De-Identification Data Flow Diagram

The tool is data-driven:

  1. The researcher configures the project with an Excel sheet.
  2. Images are requested from the source PACS.
  3. A DICOM receiver accepts and saves the images.
  4. The software de-identifies the files.
  5. The files are sent to the destination—a research PACS or a file share.

The system provides other useful outputs on a per-project basis:

  • A CSV file maps source image, series, study, and patient UIDs from the source images to the de-identified destination images.
  • Email notifications inform the user when a job is complete and where to find the files.
  • Log files document any errors in the process.

Deployment Support

After developing and testing the tool, we worked directly with UAB’s IT department to install it. We continue to provide support and occasionally implement new feature requests.

The tool has been used successfully on several research projects and has not affected the clinical PACS.