Shape-based clustering and classification of breast microcalcifications in micro-CT images

Onderzoeksoutput: Meeting abstract (Book)

Samenvatting

We present a novel classification approach for microcalcifications (MCs) extracted from core biopsy tissue samples digitized using micro-CT, a high-resolution 3D imaging modality. MCs are tiny spots of calcium that may occur in the female breast. Although they are common in healthy woman, they often are an early sign of breast cancer. Among others, the shape of the MCs is an important factor used to discriminate between benign and malignant abnormalities. However, the resolution of a mammogram is too low for a clear shape-based analysis. In addition, the images show a 2D projection of a 3D object. In case of suspiciousness, a biopsy is conducted and the extracted tissue is pathologically analysed for the presence of cancer cells. However, the MCs
themselves are mostly not analysed. Therefore, ground truth exists for the tissue samples but not for the individual MCs. However, many biopsies turn out to be negative. Therefore, the question whether some biopsies can be avoided if the shape of the MCs could be analysed in more detail has been raised. We investigated whether the introduction of a clustering step before classification can improve the sample’s classification results.
By clustering MCs according to their shape-features, similar shapes are grouped together in clusters. Subsequently, the classifier is trained to classify MCs in the correct cluster, i.e. it learns to distinguish among different shapes of MCs, independently of their original class labels assigned during anatomopathological
examination.
K-means and Minkowski Weighted K-means (MWK-means) were selected as clustering algorithms and Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) as classification algorithms. MCs that
share common shape features are grouped together in clusters. Each cluster is characterised as benign or malignant with a certain probability, based on the entropy of the cluster. After clustering, the classifier is trained
to classify MCs to the correct cluster. In this step, each MC is assigned a label and a probability of benignity or malignancy based upon the entropy of the cluster.
For the characterisation of the sample, the aforementioned classification results of MCs are adopted. More specifically, a variable threshold is set on the number of malignant MCs in a sample that are required to characterise the whole sample as malignant. At the sample level, the ultimate goal is to maximize the sensitivity - the number of correctly classified malignant samples - combined with the highest possible specificity or accuracy. Two approaches were proposed; a weighted approach that takes into consideration the probability
related to the characterisation of the MC and a non-weighted approach that takes into account only the binary classification results.
The best result is achieved with the weighted approach which delivers a sensitivity of 100%, a specificity of 42,6% and an accuracy of 72,7% for a threshold of 15% malignant MCs per sample. This is an improvement of
2%, 2,6% and 2,7% compared to the state of the art.
The achievement of 100% of sensitivity allows avoiding the case of missing any malignant samples. This improvement is the result of avoiding the bias introduced by the fact that the ground truth is distorted, i.e. the fact that malignant samples may contain unsuspicious calcifications which are considered malignant during the learning process. With this approach some unnecessary anatomopathological investigations might be avoided. Furthermore, if the future 3D high-resolution imaging would be applicable in vivo, the number of unnecessary biopsies could be decreased, thus reducing any unnecessary expenses and physical and mental discomfort for the patient.
The 3D visualisation of the actual microcalcifications can help biologists in the genome analysis, as they will now have a clear image of the clinical appearance of the MCs, such that their appearance and shape can be correlated with molecular and genome data. In the future, we plan to correlate the genome data to the properties of the extracted MCs, and try to predict the genes that are correlated with different types and shapes of microcalcifications. Furthermore, the machine learning algorithm described in this paper can be used for the classification of other data whose ground truth is unknown, such as genes with unknown expression.
Originele taal-2English
Titel9th Benelux Bioinformatics conference
SubtitelBioinformatics: Integrating data, teams and disciplines
StatusPublished - dec 2014
Evenement9th Benelux Bioinformatics Conference.Bioinformatics: Integrating data, teams and disciplines - Luxembourg, Luxembourg
Duur: 8 dec 20149 dec 2014

Conference

Conference9th Benelux Bioinformatics Conference.Bioinformatics: Integrating data, teams and disciplines
LandLuxembourg
StadLuxembourg
Periode8/12/149/12/14

Vingerafdruk

Duik in de onderzoeksthema's van 'Shape-based clustering and classification of breast microcalcifications in micro-CT images'. Samen vormen ze een unieke vingerafdruk.

Citeer dit