Whole Slide Image Set

The WSI set  is structured in Training, Validation and Testing subsets. An .xlsx file is also provided, in which the label of the WSI (RoI), the corresponding patient ID and reference set (training/validation/test). Moreover, each subset is divided into three main groups: 

currently contains sets of normal tissue images and of two histopathological distinct subtypes of benign breast lesions: Type_N, Type_PB,  Type_UDH that include WSIs annotates as Normal (N), Pathological Benign (PB), Usual Ductal Hyperplasia (UDH), respectively;

includes Type_FEA and Type_ADH subsets containing, respectively, Flat Epithelial Atypia (FEA) and Atypical Ductal Hyperplasia (ADH) lesion subtypes;

Finally, Group_MT is divided in two subsets Type_DCIS and Type_IC including WSIs annotated as Ductal Carcinoma in Situ (DCIS) and Invasive Carcinoma (IC) lesion subtypes.

The Table 1 shows the distribution of the number of WSIs according to the  groups/subtypes for the  Training, Validation and Testing subsets.

  Group_BT Group_AT Group_MT
  Type_N Type_PB Type_UDH Type_FEA Type_ADH Type_DCIS Type_IC
Training 27 120 56 24 28 40 100
Validation 10 11 9 6 8 9 12
Testing 7 16 9 11 12 12 20




Whole-slide images are stored in the .svs file format as multi-resolution pyramid structures (the size of the highest resolution image can easily exceed 100,000 by 100,000 pixels).  For some WSIs, a file in the .qpdata file format having the same filename of the WSI is provided for viewing the annotations inside the WSI.

Libraries and open source platforms that can open these file formats are listed in the Software page.

Regions of Interest  Set

The RoI set follows the equivalent organization of the WSI set. The Table 2 shows the distribution of the number of RoIs according to the   groups/subtypes for the  Training, Validation and Testing subsets of the RoI set.

  Group_BT Group_AT Group_MT
  Type_N Type_PB Type_UDH Type_FEA Type_ADH Type_DCIS Type_IC
Training 357 714 389 624 387 665 521
Validation 46 43 46 49 41 40 47
Testing 81 79 82 83 79 85 81


The Regions of Interest are provided in .png file format. The filename of a RoI includes the filename of the corresponding WSI as well as the subtype of RoI (e.g. BRACS_010_PB_32.png is the RoI number 32, extracted from the WSI named BRACS_010.svs and labeled as Pathological Benign). The resolution of each RoI is 40× and its dimension can easily exceed 4,000 by 4,000 pixels.