𝗗𝗮𝘁𝗮𝘀𝗲𝘁𝘀: Imaging, Labels - PANORAMA

Data Splits 🗃️¶

Data is sampled into three splits, with the following use-cases:¶

Public Training and Development Dataset (2238 cases):
Available for all participants and researchers, to train and develop AI models. All data is fully anonymized and made available under a non-commercial CC BY-NC 4.0 license. Includes 194 cases from the Medical Segmentation Decathlon dataset and 80 cases from National Institutes of Health. For all updates/fixes regarding this dataset, please join the challenge and check out our dedicated forum post on this topic.

Annotations were released and are maintained via: github.com/DIAGNijmegen/panorama_labels

⚠️Please be aware that the annotations repository is a live repository that can be updated should any bugs in the labels be identified by challenge organizers and/or participants. Feel free to submit a pull request if you believe any given label should be updated. These requests will be reviewed by the organization team and can be incorporated in the repository if approved.

Imaging data was released via:
- Batch 1 out of 4: https://zenodo.org/records/13715870
- Batch 2 out of 4: https://zenodo.org/records/13742336
- Batch 3 out of 4: https://zenodo.org/records/11034011
- Batch 4 out of 4: https://zenodo.org/records/10999754
Hidden Validation and Tuning Cohort (86 cases):
Used for a live, public leaderboard that enables model selection and tuning, during the Development Phase.
Hidden Testing Cohort (>400 cases):
Used to determine the top 5 AI algorithms at the end of the Development Phase. Used to benchmark AI, radiologists, and test all hypotheses at the end of the Testing Phase. Includes internal testing data (unseen cases from seen centers) and external testing data (unseen cases from an unseen center). A subset of 400 cases from this cohort is used to facilitate the PANORAMA: Reader Study.

Imaging and Clinical Data¶

The PANORAMA study data set consists of ~3000 cases from three dutch centers (Radboud University Medical Center, University Medical Center Groningen, Ziekenhuis Groep Twente), one center in Sweden (Karolinska Institutet), and one center in Norway (Haukeland University Hospital). Institutional review boards of all centers have waived the need for informed patient consent, with respect to the retrospective scientific use of anonymized clinical data in this challenge. All images are contrast-enhanced CT (CECT) scans in the portal-venous phase, as this is the most commonly acquired phase in routine abdominal imaging. All exams are from patients undergoing CECT without a history of pancreatic cancer treatment, and without any prior positive PDAC histopathology findings. All cases also contain the clinical parameters: age, sex, study date, and scanner. Missing values are possible if they were not recorded in the original dicom headers.

Reference Standard¶

The hidden testing cohort has the highest-quality reference standard for all cases, to optimally validate AI and radiologists. As histopathology analysis is the gold standard for PDAC diagnosis confirmation, all positive PDAC cases in the hidden testing cohort will have histopathology ground truth, either through surgical resection or biopsy assessment. For the negative cases, the ground-truth label will be established through histopathology assessment (for cases with non-PDAC pancreatic lesions such as cysts and intraductal mucinous neoplasms) and/or follow-up data. Only patients who do not develop PDAC within 36 months after their initial CT scan will be included as negative cases.

The public training cohort is large and representative to optimally train clinically relevant AI algorithms. Positive cases are confirmed based on histopathology when available, or based on radiology reports and follow-up from clinical routine if no biopsy or resection was performed. Negative cases will be confirmed through clinical reports. For cases deriving from existing publicly available data sets, the corresponding ground-truth will be considered. For cases in the MSD data set, differentiations between PDAC and non-PDAC cases will be made as reported by Suman et al. 2021. All training cases will carry patient/image-level annotations, of which, about half of the cases will also include expert-derived tumor delineations, while the remainder will include AI-derived delineations (based on a re-trained version of the method proposed by Alves, et al. 2022). A subset of 100 scans with the same reference standard of the testing cohort will be used as the hidden validation and tuning cohort.