Medical Imaging AI/ML Engineering Guide: Best Practices and Examples

 August 06, 2025
SHARE ON

AI/MLSoftware

Medical imaging isn’t easy for computer vision engineers—coordinate systems, image orientations, and subtle data-quality issues cause otherwise strong AI models to fail validation. Even experienced ML engineers trip over common but avoidable mistakes like incorrectly flipping images, mixing anatomical directions, or misaligning voxel coordinates. This guide is for engineers moving into medical imaging who want practical, direct advice on what to do (and what not to do).

Inside, you’ll find clear examples and best practices to:

  • Normalize images consistently to canonical coordinates (RAS/LPS)
  • Check and correct anatomical orientations upfront
  • Validate reconstruction kernels early to avoid domain shift surprises
  • Properly handle indexing and augmentation to maintain anatomical accuracy

My goal is to give you simple checks and actionable insights—without unnecessary theory—to quickly avoid the pitfalls that stall projects and delay FDA submissions. For regulatory context, you might also read my articles on FDA submission strategies for general AI models, using command-line interfaces for faster clearances, and converting user needs into clear engineering requirements.

Follow these guidelines to ship medical-imaging AI models that work reliably, pass validation the first time, and let you move faster on what matters most: building better products.

Photometric inversion for MONOCHROME1 images 🔗

Basic: The Value of Photometric Interpretation (0028,0004) specifies the intended interpretation of the image pixel data. For MONOCHROME1, the minimum pixel value is displayed as white, which is the inverse of MONOCHROME2, the more commonly used format, where the minimum is shown as black and the maximum as white.

Problem: If MONOCHROME1 images are not corrected through photometric inversion, the structures like bones, air, or soft tissue appear visually reversed, which can confuse machine learning models. If MONOCHROME1 images are used without correction, models may learn incorrect intensity features, resulting in poor generalization and reduced accuracy. Furthermore, keeping MONOCHROME1 images in the same dataset with MONOCHROME2 images leads to inconsistency, non-reproducible results, and may lead to processing errors in tasks like thresholding or segmentation. Correcting MONOCHROME1 images is essential for maintaining consistency and ensuring reliable model training.

Recommended approach: It is important to scan the dataset and identify whether each image uses MONOCHROME1 or MONOCHROME2 photometric interpretation. If an image is labeled as MONOCHROME1, its pixel intensities should be inverted to match the standard MONOCHROME2 format. Additionally, the DICOM header should be updated to reflect this change by setting the Photometric Interpretation (0028,0004) tag to MONOCHROME2, ensuring both the pixel data and metadata are consistent.

Example: Chest X-rays

The left image shows a MONOCHROME1 X-ray, where the intensities are inverted—bone appears dark and air appears white. The right image shows the corrected MONOCHROME2 version, where bone appears bright and air appears dark, demonstrating the importance of photometric correction.

Sample Code Example:

from pydicom import dcmread
import numpy as np
from pydicom.uid import generate_uid

# Read DICOM file
ds = dcmread("input.dcm")

# Check and correct MONOCHROME1
if ds.PhotometricInterpretation == "MONOCHROME1":
    img = ds.pixel_array.astype(np.float32)
    inverted_img = np.max(img) - img
    inverted_img = inverted_img.astype(ds.pixel_array.dtype)

    # Update DICOM fields
    ds.PixelData = inverted_img.tobytes()
    ds.PhotometricInterpretation = "MONOCHROME2"
    ds.SOPInstanceUID = generate_uid()

    # Save corrected file
    ds.save_as("corrected_mono2.dcm")
    print("Inversion applied and saved.")
else:
    print("No inversion needed.")

Reconcile prone/supine orientation before label propagation 🔗

Basic: CT and MRI scans can be acquired with the patient in different positions, most commonly supine (lying face up) or prone (lying face down). While the internal anatomy remains unchanged, the orientation in the image space differs, which can affect how labels align.

Problem: Prone and supine scans of the same patient can differ dramatically in spatial orientation, even if taken on the same day. Without reconciling this difference, label propagation (transferring segmentation masks or contours from one scan to another) or training AI models across such scans can lead to severe spatial inconsistencies.

Key challenges include:

  • Orientation & Label Consistency Medical images encode orientation using metadata (e.g., DICOM Patient Position or Image Orientation tags like “Head First Prone” vs. “Head First Supine”). If this information is incorrect or ignored, scans may be flipped or rotated by 180°, misaligning anatomical structures.

    Crucially, this affects label consistency. Segmentation masks or contours must align anatomically. Misinterpreting orientation can lead to mirrored, misplaced, or anatomically invalid labels, compromising both AI model training and evaluation.

  • Anatomical Deformation: Postural changes cause anatomical deformation. For instance:

    • Organs may shift under gravity in prone vs. supine (e.g., in breast or abdominal imaging).
    • The colon might distend or collapse differently in CT colonography. Even after correcting for orientation, such deformations may require deformable registration to align corresponding structures.

Recommended approach:

  • Normalize to a Standard Orientation (e.g., LPS or RAS): Use libraries like SimpleITK to convert all scans to a consistent anatomical orientation using the direction cosines stored in the image headers.
  • Perform Rigid or Deformable Registration (optional): After orientation normalization, apply rigid registration to account for translations and rotations. In cases where anatomy has shifted (e.g., due to posture or gravity), apply deformable registration to align internal structures more precisely. This depends upon the application and is not always required.
  • Propagate Labels Using the Composite Transform: Once images are aligned, propagate segmentation labels using nearest-neighbor interpolation to preserve label integrity.

Example: CT images for breast cancer radiation

The left image shows the patient in the supine position (lying face up), while the right image shows the prone position (lying face down).

Sample Code:

import SimpleITK as sitk

def normalize_orientation(image, orientation='LPS'):
    """Convert image to a standard anatomical orientation."""
    return sitk.DICOMOrient(image, orientation)

def rigid_register(fixed, moving):
    """Perform rigid registration using mutual information."""
    initial_transform = sitk.CenteredTransformInitializer(
        fixed, moving,
        sitk.Euler3DTransform(),
        sitk.CenteredTransformInitializerFilter.GEOMETRY
    )

    registration = sitk.ImageRegistrationMethod()
    registration.SetMetricAsMattesMutualInformation(50)
    registration.SetOptimizerAsRegularStepGradientDescent(
        learningRate=1.0,
        minStep=1e-4,
        numberOfIterations=100
    )
    registration.SetInterpolator(sitk.sitkLinear)
    registration.SetInitialTransform(initial_transform, inPlace=False)

    final_transform = registration.Execute(fixed, moving)
    return final_transform

def deformable_register(fixed, moving, initial_transform=None):
    """Perform deformable B-spline registration."""
    tx = sitk.BSplineTransformInitializer(fixed, [8, 8, 8])

    registration = sitk.ImageRegistrationMethod()
    registration.SetInitialTransformAsBSpline(tx, inPlace=True, scaleFactors=[1, 2, 5])
    registration.SetMetricAsMattesMutualInformation(50)
    registration.SetOptimizerAsGradientDescent(1.0, 100)
    registration.SetInterpolator(sitk.sitkLinear)

    transform = registration.Execute(fixed, moving)
    return transform

def propagate_label(label_image, reference_image, transform):
    """Warp label image using nearest-neighbor interpolation."""
    return sitk.Resample(
        label_image,
        reference_image,
        transform,
        sitk.sitkNearestNeighbor,
        0,
        label_image.GetPixelID()
    )

Checking the Reconstruction Kernel 🔗

Basic: The kernel, often referring to a convolution kernel, refers to the process used to modify the frequency contents of projection data prior to back projection during image reconstruction in a CT scanner. The selection of reconstruction kernel should be based on specific clinical applications. There is a tradeoff between spatial resolution and noise for each kernel. A smoother kernel generates images with lower noise but with reduced spatial resolution. A sharper kernel generates images with higher spatial resolution, but increases the image noise. For example, smooth kernels are usually used in brain exams or liver tumor assessment to reduce image noise and enhance low contrast detectability, whereas sharper kernels are usually used in exams to assess bony structures due to the clinical requirement of better spatial resolution.

Problem:

  • Reconstruction kernels significantly alter image characteristics such as sharpness, noise, and texture. This directly impacts how anatomical structures appear in CT images, which can mislead AI models if not standardized.
  • AI models are sensitive to variations in input data, and differences in kernels create a domain shift between training and inference data. If a model is trained on soft kernels but tested on sharp ones, its performance can drop due to mismatched textures and edge clarity.
  • Vendor-specific naming and implementation of kernels add complexity, making it essential to normalize or filter datasets based on kernel type to ensure consistency across studies or institutions.
  • Ignoring reconstruction kernels can lead to degraded model performance, bias, and poor generalizability

Recommended Approach:

  1. Extract the reconstruction kernel information from the DICOM metadata using the tag 0018,1210. This value indicates the type of convolution kernel applied during image reconstruction and must be reviewed before using the scan for model training or inference.
  2. Normalize the kernel names to account for variations across scanner vendors.
  3. Incorporate kernel checking as a preprocessing step before any model development or analysis. This ensures that kernel-based variability does not silently affect the model’s performance, particularly in sensitive tasks like segmentation, radiomics, or classification.
  4. Apply harmonization techniques if dataset include scans with different kernels. This could involve using smoothing filters, histogram matching, radiomic feature normalization (e.g., ComBat), or advanced domain adaptation strategies to minimize the impact of kernel differences on model generalization.

Example: Reconstruction kernels used on lungs CT

Hard Kernel Soft Kernel

Sample Code:

import pydicom
# Check reconstruction kernel from a DICOM file
def check_kernel(dicom_path, accepted_kernels={"SOFT_TISSUE"}):
    try:
        ds = pydicom.dcmread(dicom_path, stop_before_pixels=True)
        kernel = ds.get("ReconstructionKernel", "").strip()
        if not kernel:
            print(f"{dicom_path}: No kernel info found.")
            return None
        category = normalize_kernel(kernel)
        print(f"{os.path.basename(dicom_path)} | Kernel: {kernel} | Category: {category}")
        if category in accepted_kernels:
            return "Accepted"
        else:
            return "Rejected"
    except Exception as e:
        print(f" Error processing {dicom_path}: {e}")
        return None
import os
import numpy as np
import SimpleITK as sitk
from skimage.exposure import match_histograms
import pandas as pd
from neuroHarmonize import harmonizationLearn, harmonizationApply

# Apply Smoothing to Sharp Kernels
def apply_smoothing(image, sigma=1.0):
    """Apply Gaussian smoothing to reduce edge artifacts from sharp kernels."""
    return sitk.DiscreteGaussian(image, sigma)

# Histogram Matching to Reference Scan
def match_histogram_to_reference(moving_image, reference_image):
    """Match histogram of moving_image to reference_image using scikit-image."""
    moving_np = sitk.GetArrayFromImage(moving_image)
    ref_np = sitk.GetArrayFromImage(reference_image)
    matched = match_histograms(moving_np, ref_np, multichannel=False)
    return sitk.GetImageFromArray(matched.astype(moving_np.dtype))

# Radiomics Feature Harmonization with ComBat
def harmonize_features_combat(features_df, batch_labels):
    """
    Apply ComBat harmonization to radiomics features.
    features_df: DataFrame with rows=samples, cols=features
    batch_labels: List or array with batch (kernel) labels
    """
    model, data_transformed = harmonizationLearn(features_df.values, batch_labels)
    df_harmonized = pd.DataFrame(data_transformed, columns=features_df.columns)
    return df_harmonized

Proper resampling for dataset with mixed thin and thick slice volume 🔗

Basic: In medical imaging, particularly with modalities like CT and MRI, spatial resolution refers to the ability to distinguish small structures that are close together, it defines how much anatomical detail can be visualized in an image. Thin slices typically offer higher spatial resolution, enhancing the visibility of fine anatomical details and small lesions. In contrast, thicker slices generally provide a better signal-to-noise ratio (SNR) and reduce image noise, but at the cost of reduced spatial resolution. The choice between thin and thick slices involves a trade-off between resolution and noise, guided by the specific clinical objectives and the anatomical region being examined.

Problem:

When working with datasets that include both thin and thick-slice volumes, inconsistency in voxel dimensions creates challenges for model training, radiomics, and visualization. This is where resampling becomes essential. Resampling involves changing voxel dimensions and the overall volume size to bring all datasets to a common resolution. Without it, variations in spatial resolution often caused by differences in scanners, protocols, or reconstruction settings may lead to bias, domain shifts, and unreliable results. Moreover, high-resolution images demand more computation, making it impractical to process large datasets efficiently. Resampling helps reduce computational complexity by increasing voxel spacing, standardizes data for consistent analysis, and improves visualization by simplifying overly detailed volumes. It also serves as a form of data compression and addresses computational limitations in real-time or resource-constrained environments.

Recommended Approach:

The most widely used technique is interpolation-based resampling. This method estimates pixel or voxel values based on neighboring data points. Common interpolation techniques include:

  • Nearest neighbor: Assigns the value of the nearest pixel/voxel. This is used for labels.
  • Linear interpolation: Averages the values of neighboring pixels/voxels. Provides a smoother result than nearest neighbor.
  • Cubic interpolation: Uses a cubic polynomial to estimate values, offering a higher degree of smoothness and accuracy.

Additionally, resizing can be used to standardize input dimensions for deep learning models, and reformatting is helpful for generating orthogonal views (e.g., sagittal or coronal from axial stacks) through interpolation.

Example:

Sample code:

import SimpleITK as sitk

def resample_volume(image, new_spacing=(1.0, 1.0, 1.0), interpolator=sitk.sitkLinear):
    """
    Resample a 3D medical image (CT or MRI) to a specified resolution.

    Args:
        image (sitk.Image): Input image volume.
        new_spacing (tuple): Desired spacing (z, y, x) in mm.
        interpolator (SimpleITK interpolator):
            Use sitkLinear for images, sitkNearestNeighbor for labels/masks.

    Returns:
        sitk.Image: Resampled image.
    """
    original_spacing = image.GetSpacing()
    original_size = image.GetSize()

    new_size = [
        int(round(original_size[i] * (original_spacing[i] / new_spacing[i])))
        for i in range(3)
    ]

    resampler = sitk.ResampleImageFilter()
    resampler.SetOutputSpacing(new_spacing)
    resampler.SetSize(new_size)
    resampler.SetInterpolator(interpolator)
    resampler.SetOutputDirection(image.GetDirection())
    resampler.SetOutputOrigin(image.GetOrigin())
    resampler.SetDefaultPixelValue(image.GetPixelIDValue())

    return resampler.Execute(image)
import torchio as tio

def resample_volume_torchio(input_path, output_path, new_spacing=(1.0, 1.0, 1.0)):
    """
    Resample a 3D medical image using TorchIO to a specified spacing.
    Use tio.LabelMap instead of tio.ScalarImage if resampling segmentation 
    masks TorchIO will automatically use nearest-neighbor interpolation for 
    labels.
    Args:
        input_path (str): Path to the input image.
        output_path (str): Path to save the resampled image.
        new_spacing (tuple): Desired spacing (x, y, z) in mm.

    Returns:
        None
    """
    subject = tio.Subject(image=tio.ScalarImage(input_path))
    resampler = tio.Resample(new_spacing)
    resampled = resampler(subject)
    resampled.image.save(output_path)

Orientation normalisation before directly indexing voxel arrays or haphazardly transposing tensors 🔗

Basic: Orientation normalization is the process of transforming a medical image volume (e.g., CT, MRI) so that it aligns with a standard anatomical frame of reference, such as LPS (Left-Posterior-Superior) or RAS (Right-Anterior-Superior). In medical image analysis, data is typically stored as 3D volumes, where voxel ordering is defined by orientation metadata. However, directly indexing voxel arrays (e.g., NumPy arrays) or arbitrarily transposing tensor dimensions without first accounting for orientation can lead to serious misinterpretations.

Problem: Indexing voxel arrays or transposing tensors before orientation normalization can result in misaligned reference frames between volumes. This leads to incorrect anatomical interpretations, flawed spatial operations (e.g., cropping, segmentation), and ultimately invalid model outputs. Such inconsistencies are subtle and often go unnoticed during development, but they can significantly compromise model accuracy, reproducibility, and clinical reliability, especially in tasks requiring anatomical precision.

Recommended Approach:

Normalize orientation before converting medical images to NumPy arrays or tensors. This ensures consistent anatomical alignment (e.g., LPS) across subjects. Perform any image preprocessing, augmentation, or spatial operation after orientation normalization.

What NOT to do:

Don’t catch transpositis—a condition afflicting some medical imaging codebases where images get transposed multiple times until they “look right.”

Example:

Sample Code:

import SimpleITK as sitk
import numpy as np
import torch

def load_and_prepare_image(image_path, target_orientation='LPS'):
    """
    Load a medical image, normalize its orientation, and convert it to a NumPy array and PyTorch tensor.

    Args:
        image_path (str): Path to the input medical image (e.g., NIfTI or DICOM).
        target_orientation (str): Standard orientation code ('LPS' or 'RAS').

    Returns:
        numpy.ndarray: Image data as a NumPy array.
        torch.Tensor: Image data as a PyTorch tensor.
    """
    # Step 1: Load image
    image = sitk.ReadImage(image_path)

    # Step 2: Normalize orientation
    image_oriented = sitk.DICOMOrient(image, target_orientation)

    # Step 3: Convert to NumPy array
    image_array = sitk.GetArrayFromImage(image_oriented)  

    # Step 4: Convert to PyTorch tensor (optional)
    image_tensor = torch.from_numpy(image_array).float()  

    return image_array, image_tensor

HU Clipping (and Window / Leveling) 🔗

Basic: Hounsfield units (HU) are a dimensionless unit universally used in computed tomography (CT) scanning to express CT numbers in a standardized and convenient form. Hounsfield units are obtained from a linear transformation of the measured attenuation coefficients, which is based on the arbitrarily-assigned radiodensities of air (-1000 HU) and pure water (0 HU). HU clipping refers to enhance image quality and highlight specific tissues or structures.

Problem: CT images contain a wide range of Hounsfield Unit (HU) values that correspond to different tissue densities, such as air, fat, soft tissue, and bone. Displaying the full HU range in a single image can reduce contrast and obscure subtle differences within specific tissues, making it harder to interpret. While CT voxels typically contain 16 bits of precision which corresponds to around 65k distinct values. The human eye and consumer grade monitors, on the other hand, can only perceive 30 to 256 levels of gray. Therefore, the original 16 bit image needs to be scaled so that humans can appropriately perceive contrast values to make appropriate clinical decisions. HU clipping addresses this by narrowing the displayed intensity range, which enhances visibility and improves tissue contrast. This is particularly beneficial for image analysis tasks like segmentation, where distinguishing between anatomical structures is essential for both human observers and machine learning algorithms. Additionally, HU clipping helps standardize the appearance of images acquired from different scanners or patients, facilitating more consistent comparisons across datasets.

Recommended Approach: To perform HU clipping effectively, one must firstdetermine the target tissue or clinical task, such as lung, brain, or bone analysis. Based on this, an appropriate HU window can be selected to isolate the relevant intensity range. Common examples include: –1000 to 400 for lung parenchyma, –150 to 250 for soft tissue, 0 to 80 for brain structures, and 300 to 2000 for bone. Once selected, the CT intensity values are clipped to this range using tools like NumPy or SimpleITK.

Example: A transverse slice of a chest CT with no intensity windowing, a "lung" intensity window, and a "bone" intensity window.

Sample code:

import torchio as tio

def get_torchio_hu_window_transform(hu_min: int, hu_max: int):
    """
    Returns a TorchIO transform for HU clipping and normalization to [0, 1].

    Parameters:
    ----------
    hu_min : int
        Lower HU threshold.
    hu_max : int
        Upper HU threshold.

    Returns:
    -------
    torchio.Compose
        A TorchIO Compose transform.
    """
    return tio.Compose([
        tio.ToCanonical(),
        tio.RescaleIntensity(
            out_min_max=(0, 1),
            in_min_max=(hu_min, hu_max)
        )
    ])
from monai.transforms import Compose, LoadImaged, EnsureChannelFirstd, ScaleIntensityRanged

def get_monai_hu_window_transform(hu_min: int, hu_max: int):
    """
    Returns a MONAI Compose transform for HU clipping and normalization to [0, 1].

    Parameters:
    ----------
    hu_min : int
        Lower bound of HU window.
    hu_max : int
        Upper bound of HU window.

    Returns:
    -------
    monai.transforms.Compose
        A composed MONAI transform.
    """
    return Compose([
        LoadImaged(keys=["image", "label"]),
        EnsureChannelFirstd(keys=["image", "label"]),
        ScaleIntensityRanged(
            keys="image",
            a_min=hu_min,
            a_max=hu_max,
            b_min=0.0,
            b_max=1.0,
            clip=True
        )
    ])

Labelling different contrast phases 🔗

Basic: Contrast phases in CT imaging represent different time points after intravenous contrast injection, each highlighting specific tissue enhancement patterns. The contrast agent travels through the vascular system and organs, and scans are timed to capture key phases like non-contrast, portal venous, and delayed.

To identify contrast phases in CT scans using DICOM metadata, several tags are commonly referenced. TheSeries Description(0008,103E) andProtocol Name(0018,1030) often contain free-text labels like "arterial," "venous," or "delayed," which can hint at the intended phase. TheContrast/Bolus Agent(0018,0010) specifies the type of contrast used, while theContrast/Bolus Start Time(0018,1042) andSeries Time(0008,0031) orAcquisition Time(0008,0032) can be compared to estimate the delay between injection and scan

Problem: Phase labels are often manually assigned and prone to error, especially in large datasets. Variations in contrast injection protocols, patient physiology, and scanner settings further complicate accurate labeling. Manual correction is time-consuming and inconsistent.

Recommended Approach:

  • Image-Based Classification: Train a lightweight CNN or transformer-based model to classify contrast phases directly from CT images, reducing reliance on potentially incorrect metadata.
  • Use Heuristics for QA: Apply simple rules based on Hounsfield Units (HU) in key regions (e.g., liver, aorta) to flag likely mislabels.
  • Time-Based Sorting (if multi-phase scans are present): Use DICOM metadata such as AcquisitionTime to infer the chronological order of phases (e.g., non-contrast < portal venous < delayed).
  • Robust Labeling Pipeline: Store phase predictions separately (e.g., JSON/CSV) rather than overwriting source data. Allow human review for uncertain cases.

Example:

Sample Code:

This function estimates the CT contrast phase (Non-contrast, Arterial, Venous, Delayed) based on the average intensity of an image series.

## ⚡ Simple Contrast Phase Detection Based on Mean HU Values

```python
import numpy as np

def detect_contrast_phase(image_series):
    """
    Estimate contrast phase using mean HU value of the entire image series.

    Parameters:
    -----------
    image_series : np.ndarray
        A NumPy array representing a CT volume or series of slices (in HU).

    Returns:
    --------
    str
        Estimated phase: 'NC' (Non-contrast), 'ART' (Arterial), 
                         'VEN' (Venous), or 'DEL' (Delayed).
    """
    mean_hu = np.mean(image_series)
    std_hu = np.std(image_series)

    if mean_hu < 50:
        return "NC"
    elif 50 <= mean_hu < 90:
        return "ART"
    elif 90 <= mean_hu < 120:
        return "VEN"
    else:
        return "DEL"

Using ECG/respiratory gating information in cardiac CT 🔗

Basic: Cardiac CT imaging captures the anatomy and function of the heart, which is a moving organ. To reduce motion artifacts caused by the cardiac cycle and breathing, CT systems use ECG gating and, less commonly, respiratory gating. Cardiac gating or ECG-gated angiography in CT is an acquisition technique that triggers a scan during a specific portion of the cardiac cycle. Often, this technique is used to obtain high-quality scans void of pulsation artifact. These gating signals are often recorded in the DICOM metadata or as auxiliary files.

In cardiac CT imaging, DICOM tags related to ECG and respiratory gating provide crucial metadata for understanding when in the physiological cycle each image was acquired. Key tags includeCardiac Reconstruction Phase(0018,9152), which indicates the percentage of the R–R interval (e.g., 75% for end-diastole), andTrigger Time(0018,1060), which specifies the time offset from the ECG R-wave. Additional tags such asHeart Rate(0018,1088),Gating Type(0018,1031), andTrigger Source(0018,1061) describe how and whether ECG or respiratory gating was applied. These tags are essential for selecting motion-minimized frames, ensuring phase consistency in datasets, and enabling accurate downstream analysis in both clinical and AI workflows.

Problem: Ignoring ECG or respiratory gating information in cardiac CT can result in using frames affected by motion, leading to degraded image quality and poor diagnostic utility. In AI pipelines, training or inference on ungated or incorrectly phased images can reduce model performance, especially for tasks like coronary calcium scoring, left ventricular segmentation, or functional analysis. Furthermore, not accounting for gating can lead to inconsistencies when comparing multi-phase datasets or across subjects, where different cardiac phases may have been captured. This undermines reproducibility, accuracy, and generalizability of downstream analysis.

Recommended Approach:

  • Identify and extract image series closest to a specific cardiac phase (e.g., 75% R–R interval for end-diastole).
  • For AI pipelines, train only on motion-minimized gated phases, or include phase as a metadata variable.
  • If multiple phases are available (e.g., 10-frame cine CT), select or average phases after quality assessment.
  • Consider frame rejection strategies using variance or artifact detection if explicit gating info is missing.

Example:

Images comparing retrospective ECG-gated CT angiography at each phase of the R-R interval in a 53-year-old man with acute aortic dissection

Sample Code:

This Python function scans a folder of ECG-gated cardiac CT DICOM files and selects the slice closest to a specified cardiac phase (e.g., 75% of the R–R interval).

import pydicom
import numpy as np
from pathlib import Path

def analyze_cardiac_gating(dicom_folder, target_phase=75):
    """
    Analyze ECG-gated cardiac CT series and select images closest to target cardiac phase.

    Args:
        dicom_folder: Path to folder containing DICOM files
        target_phase: Desired cardiac phase (% of R-R interval, typically 70-75% for end-diastole)

    Returns:
        Dictionary containing gating information and selected slices
    """
    dicom_files = list(Path(dicom_folder).glob('*.dcm'))
    if not dicom_files:
        raise FileNotFoundError("No DICOM files found in the specified folder")

    # Collect gating info from all slices
    gating_data = []
    for file in dicom_files:
        ds = pydicom.dcmread(file)

        try:
            phase = float(ds.CardiacReconstructionPhase)  # (0018,9152)
            trigger_time = float(ds.TriggerTime) if 'TriggerTime' in ds else None  # (0018,1060)
            heart_rate = float(ds.HeartRate) if 'HeartRate' in ds else None  # (0018,1088)

            gating_data.append({
                'file': file,
                'phase': phase,
                'trigger_time': trigger_time,
                'heart_rate': heart_rate,
                'instance_number': int(ds.InstanceNumber)
            })
        except AttributeError as e:
            print(f"Missing gating tags in {file.name}: {e}")
            continue

    if not gating_data:
        raise ValueError("No valid ECG-gated DICOM files found")

    # Find slice closest to target phase
    phases = np.array([x['phase'] for x in gating_data])
    phase_diffs = np.abs(phases - target_phase)
    best_idx = np.argmin(phase_diffs)
    best_slice = gating_data[best_idx]

 

    return {
        'best_slice': best_slice,
        'all_slices': sorted(gating_data, key=lambda x: x['instance_number'])
    }

Handling vendor-specific header variations to break loaders 🔗

Basic: Medical imaging vendors often use different conventions and private DICOM tags for storing metadata, even when the imaging modality (e.g., CT, MRI) is the same. While the DICOM standard defines many core attributes, manufacturers frequently insert vendor-specific extensions, nonstandard names, or omit expected fields altogether. This creates variation across scanners from GE, Siemens, Philips, Canon, etc.

Problem: Hardcoding data loaders or preprocessing pipelines to expect fixed tag names or structures can result in failures when encountering data from a different vendor. For example, one scanner may store series description under standard tags, while another may embed important phase or orientation info in private tags or use different encodings. As a result, such rigid pipelines may miss key metadata, misinterpret orientations, or even crash during parsing. This creates poor generalizability, especially when scaling to multi-center datasets or deploying models across institutions.

Recommended Approach:

  1. Defensive parsing: Build loaders that anticipate variations
  2. Vendor detection: Identify manufacturer (0008,0070) early in processing
  3. Fallback strategies: Implement multiple ways to find critical metadata
  4. Tag aliases: Maintain mappings of equivalent tags across vendors
  5. Validation layers: Verify data integrity after loading

Sample code:

import pydicom
import pydicom
import os

def extract_series_description(ds):
    """Extracts series description with vendor-specific fallback."""
    try:
        return ds.SeriesDescription  # Standard tag (0008,103E)
    except AttributeError:
        # Try vendor-specific or private tags
        if '0009,1010' in ds:  # Example private tag
            return ds['0009,1010'].value
        return 'Unknown'

def detect_vendor(ds):
    """Identify scanner vendor from tag (0008,0070)"""
    return getattr(ds, 'Manufacturer', 'Unknown')

def load_dicom_headers(folder_path):
    """Parse headers with fallbacks and validation"""
    files = [f for f in os.listdir(folder_path) if f.endswith('.dcm')]
    for file in files:
        filepath = os.path.join(folder_path, file)
        try:
            ds = pydicom.dcmread(filepath, stop_before_pixels=True)
            vendor = detect_vendor(ds)
            desc = extract_series_description(ds)
            print(f"{file} → Vendor: {vendor} | Series: {desc}")
        except Exception as e:
            print(f"Error reading {file}: {e}")

# Example usage
load_dicom_headers("path/to/dicom_folder")

Avoiding Spatial Errors: Evaluating Metrics in Scanner vs. Patient Space 🔗

Basic: In medical image analysis, particularly with CT or MRI data, images are typically stored in one of two coordinate systems: scanner (or voxel) space and patient (or world) space.

Scanner Space (or Image Space): This refers to the coordinate system defined by the scanner itself during image acquisition. It's the physical space within the scanner where the data is collected.

Patient Space: This refers to the anatomical coordinate system of the patient, often defined by anatomical landmarks or a reference coordinate system.

Problem: Computing spatial metrics or alignment-based evaluations in scanner space without accounting for the physical voxel spacing, image origin, and orientation can introduce significant errors. For example, comparing a segmentation mask in patient space (with correct spatial alignment) to one in scanner space (just voxel-aligned) may falsely report poor overlap or boundary accuracy. This can mislead model performance evaluation, especially in multi-scanner or multi-center datasets where spacing and orientation vary.

Recommended Approach: Convert both predicted and ground-truth masks to the same space before comparison, ideally to patient space using DICOM/SITK orientation metadata.

Sample code:

import SimpleITK as sitk
import numpy as np
from typing import Tuple, Dict

class SpaceAwareEvaluator:
    """
    Handles spatial metrics computation while maintaining awareness of coordinate spaces.
    Converts all inputs to patient space before evaluation.
    """
    
    def __init__(self, reference_image: sitk.Image):
        """
        Initialize with a reference image that defines the target patient space.
        
        Args:
            reference_image: sitk.Image with proper patient space metadata
        """
        self.reference_space = reference_image
        self.metrics = {}
        
    def _ensure_patient_space(self, image: sitk.Image) -> sitk.Image:
        """Convert image to patient space if not already"""
        if not self._is_in_patient_space(image):
            return self._transform_to_patient_space(image)
        return image
    
    def _is_in_patient_space(self, image: sitk.Image) -> bool:
        """Check if image shares patient space with reference"""
        return (
            np.allclose(image.GetOrigin(), self.reference_space.GetOrigin()) and
            np.allclose(image.GetSpacing(), self.reference_space.GetSpacing()) and
            np.allclose(image.GetDirection(), self.reference_space.GetDirection())
        )
    
    def _transform_to_patient_space(self, image: sitk.Image) -> sitk.Image:
        """Resample image to reference patient space"""
        return sitk.Resample(
            image,
            self.reference_space,
            sitk.Transform(),
            sitk.sitkNearestNeighbor,  # For masks/labels
            0,  # Default background value
            image.GetPixelID()
        )
    
    def compute_metrics(
        self,
        prediction: sitk.Image,
        ground_truth: sitk.Image,
        metric_names: list = ['dice', 'hausdorff', 'surface_distance']
    ) -> Dict[str, float]:
        """
        Compute spatial metrics after ensuring both images are in patient space.
        
        Args:
            prediction: Segmentation mask (scanner or patient space)
            ground_truth: Reference mask (scanner or patient space)
            metric_names: List of metrics to compute
            
        Returns:
            Dictionary of metric names and values
        """
        # Convert both images to patient space
        pred_patient = self._ensure_patient_space(prediction)
        gt_patient = self._ensure_patient_space(ground_truth)
        
        # Compute requested metrics
        results = {}
        if 'dice' in metric_names:
            overlap_filter = sitk.LabelOverlapMeasuresImageFilter()
            overlap_filter.Execute(gt_patient, pred_patient)
            results['dice'] = overlap_filter.GetDiceCoefficient()
        
        if 'hausdorff' in metric_names or 'surface_distance' in metric_names:
            hausdorff_filter = sitk.HausdorffDistanceImageFilter()
            hausdorff_filter.Execute(gt_patient, pred_patient)
            results['hausdorff'] = hausdorff_filter.GetHausdorffDistance()
            
            if 'surface_distance' in metric_names:
                results['surface_mean'] = hausdorff_filter.GetAverageHausdorffDistance()
        
        return results

# Usage Example
if __name__ == "__main__":
    # Load images (example paths)
    reference_ct = sitk.ReadImage("path/to/reference_ct.dcm")
    scanner_space_mask = sitk.ReadImage("path/to/scanner_space_mask.nii.gz")
    patient_space_mask = sitk.ReadImage("path/to/patient_space_mask.nii.gz")
    
    # Initialize evaluator with reference CT (defines desired patient space)
    evaluator = SpaceAwareEvaluator(reference_ct)
    
    # Compute metrics - handles space conversion automatically
    metrics = evaluator.compute_metrics(
        prediction=scanner_space_mask,
        ground_truth=patient_space_mask,
        metric_names=['dice', 'hausdorff']
    )
    
    print(f"Dice Coefficient: {metrics['dice']:.3f}")
    print(f"Hausdorff Distance: {metrics['hausdorff']:.2f} mm")
    
   

Using random-seed control to produce experiments 🔗

Basic: Random seeds are used to control the pseudo-random number generators (PRNGs) in machine learning experiments. This includes operations such as data splitting, weight initialization, data augmentation, and shuffling.

Problem:

Omitting seed control causes:

  1. Non-reproducible results : Different runs yield varying metrics despite identical code/hyperparameters
  2. Unreliable comparisons : Cannot fairly evaluate model improvements
  3. Debugging challenges : Inconsistent behavior masks implementation errors
  4. Scientific integrity risks : Published results become non-verifiable

Recommended Approach:

Set random seeds explicitly for all randomness sources: Python, NumPy, and deep learning frameworks like PyTorch or TensorFlow and store it in experiment metadata or logging tools.

Sample Code:

import random
import numpy as np
import torch

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    # For deterministic behavior
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)

Not relying on default nnU-Net post-processing when class imbalance 🔗

Basic: nnU-Net includes automatic post-processing, such as connected component analysis, to clean up predictions (e.g., keeping only the largest connected component). While these defaults are effective for many tasks, they are not universally optimal, especially in multi-class or class-imbalanced settings.

Problem: When using default post-processing, nnU-Net may remove small but clinically significant structures (e.g., small tumors or vessels) from underrepresented classes. This disproportionately affects rare classes, leading to reduced recall and distorted performance metrics. In extreme cases, entire classes may be missed if their components are smaller than the largest background-connected region.

Recommended Approach: Develop and integrate custom post-processing logic that uses class-specific thresholds for preserving smaller, significant structures.

Example Image:

Original Image Ground Truth nnU-Net

Sample Code:

This can be done by overriding the remove_all_but_largest_component() logic in nnU-Net’s postprocessing pipeline.


from nnunet.postprocessing.connected_components import load_remove_save

# Override the default CCA
def new_remove_all_but_largest_component(binary_image: np.ndarray, *args, **kwargs):
    min_sizes = {1: 50, 2: 20}  # Class-specific thresholds
    class_idx = kwargs.get('class_idx', 1)  # Pass class index from load_remove_save
    return custom_cca(binary_image, class_idx, min_sizes)

# Replace nnU-Net's function
import nnunet.postprocessing.connected_components as orig_cca
orig_cca.remove_all_but_largest_component = new_remove_all_but_largest_component

Not neglecting anisotropic z-spacing when selecting 3-D kernels and strides 🔗

Basic: In 3D medical imaging (e.g., CT or MRI), voxel dimensions are not always uniform in all directions. This is particularly true in the z-axis (slice direction), which often has a larger spacing than the x and y dimensions. This non-uniformity is called anisotropic spacing. When applying 3D convolutions or downsampling operations, ignoring this anisotropy can lead to inappropriate kernel sizes and strides, potentially degrading model performance.

Problem: When 3D kernels or strides are selected without accounting for anisotropic z-spacing, it can lead to many problems:

  • Difficulty in learning meaningful features: Standard 3x3x3 isotropic kernels might struggle to learn effectively from anisotropic voxels due to varying information density along each dimension.
  • Missing important information: If strides in the z-direction are chosen to be large, important information between slices can be skipped, especially in tasks like detection where subtle details are crucial.

Recommended Approach:

  • Use anisotropic kernels for initial layers, e.g., (1×3×3) instead of (3×3×3), to limit operations in the z-direction when spacing is large, at least in initial layers.
  • Resampling the image to isotropic resolution to compensate for anisotropic spacing however it can introduce redundant data, leading to unnecessary computational cost.

Sample Code:


import torch.nn as nn

class Anisotropic3DCNN(nn.Module):
    def __init__(self):
        super().__init__()
        # Early layer: Avoid downsampling in Z
        self.conv1 = nn.Conv3d(1, 16, kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=(0, 1, 1))
        self.bn1 = nn.BatchNorm3d(16)
        self.relu1 = nn.ReLU()

        # Later layer: Isotropic kernel (Z is more meaningful now)
        self.conv2 = nn.Conv3d(16, 32, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=1)
        self.bn2 = nn.BatchNorm3d(32)
        self.relu2 = nn.ReLU()

    def forward(self, x):
        x = self.relu1(self.bn1(self.conv1(x)))
        x = self.relu2(self.bn2(self.conv2(x)))
        return x

Not committing checkpoints larger than 1 GB directly to Git 🔗

Basic: Large model checkpoints, especially those from deep learning frameworks like PyTorch or TensorFlow, can exceed several gigabytes in size.

Problem: Developers sometimes try to commit these files directly to Git, unaware that Git is not optimized for large binary files.

Recommended Approach:

Git LFS (Large File Storage)

  • Stores large files externally (on GitHub/GitLab servers) while keeping pointers in Git
  • Best for checkpoints that change frequently but are needed by all team members

Sample Code:


# Install Git LFS
git lfs install

# Track checkpoint files
git lfs track "*.ckpt" "*.h5" "*.pth"

# Commit as usual (LFS handles the rest)
git add .gitattributes
git add model.ckpt
git commit -m "Add model checkpoint"

Revision History 🔗

Date Changes
2025-07-15 Initial Version

Get To Market Faster

Subscribe to Read the Full Article

Our Medtech tips will help you get safe and effective Medtech software on the market faster. We cover regulatory process, AI/ML, software, cybersecurity, interoperability and more.

SHARE ON
×