This page details the two sub-tasks that are part of this challenge, namely: (1) Vertebra localisation and identification, jointly termed vertebra labelling and (2) Vertebra segmentation. Within each task, the format of the annotations and the metrics used for evaluation are described.

Task 1: Vertebra Labelling

Labelling of vertebrae has immediate diagnostic and modelling significance, e.g.: localised vertebrae are used as markers for detecting kyphosis or scoliosis, in surgical planning, or for follow-up analysis tasks such as vertebral segmentation or their bio-mechanical modelling for load analysis. 

In this task, given a spine CT scan, the task is to label all the vertebrae within the field-of-view. Essentially, this is a landmark detection task. The output of this stage should be a list of the three-dimensional coordinate locations of the vertebrae according to the coordinate system described in the 'Data' page.


We use four metrics for evaluating the labelling performance of your algorithm, two at the dataset level and two at the scan level.

1. Identification rate (in %): As defined in [1]. Ratio of vertebrae 'identified' in the full test set. A vertebra is correctly 'identified' if the ground truth vertebral location if closest to the predicted vertebral location (eg. predicted L1 to ground truth L1) and this distance is less than 20mm.   

2. Localisation distance (in mm): As defined in [1]. Mean localisation distance over all vertebrae in the test set. Distance of each predicted vertebral location from its ground truth vertebral location. 

3. Recall (in %, subject to slight modification): As defined in [2]. R = #hits/#actual, where #hits is the number of vertebrae satisfying the condition of identification as defined for id. rate above and #actual is the number of vertebrae actually present in the image. It captures the ratio of correctly 'identified' vertebrae per scan.

4. Precision (in %, subject to slight modification): As defined in [2]. P = #hits/#predicted, where #predicted is the total number of vertebrae predicted to be in the image. For example: this penalises the case where the scan has five vertebrae L1-L5, while the algorithm predicts eight vertebra T10-L5.

*Please refer to the challenge report [3] for why Recall and Precision were not used during ranking. 

Task 2: Vertebra Segmentation

Spine segmentation is a crucial component in quantitative medical image analysis. It directly allows detection and assessment of vertebral fractures and indirectly supports modelling and monitoring of the spinal ageing process.

In this task, given a spine CT scan, the task is to generate accurate voxel-level segmentation maps of the vertebrae present in the scan. Essentially, this is a multi-label segmentation task. The output of this task should be another 3D volume of the same size and orientation as the input scan with integer values between 1 and 2425.


We use two ubiquitous metrics prevalent in the medical image segmentation domain. 

1. DICE Coefficient (in %):  Measures the ratio of segmentation overlap accuracy in the form of F1 score at voxel level. Here, DICE is computed per-label as 2|AB|/(|A| + |B|), where 'A' is the ground foreground voxels of a certain label and 'B' is the predicted set.

2. Hausdorff Surface Distance (in mm): Measures the local maximum distance between the two surfaces constructed out of the ground truth and predicted segmentation map.

NOTE: Find the updated evaluation protocol elaborated in the VerSe'19 challenge report [3].


  1. Glocker, B., et al.: Automatic localization and identification of vertebrae in arbi- trary field-of-view ct scans. In: MICCAI. (2012)
  2. Sekuboyina A, et al.: Btrfly Net: Vertebrae Labelling with Energy- based Adversarial Learning of Local Spine Prior. In: MICCAI. (2018) 
  3. Sekuboyina A et al.:  VerSe: A Vertebrae Labelling and Segmentation Benchmark. In: arXiv:2001.09193. Jan 2020.