Machine learning techniques in medicine have been at the forefront addressing challenges such as diagnosis, prognosis prediction, or precision medicine. In this field, the data is sometimes abundant but comes from different data sources or lack assigned labels. The process of manually labeling this data when conforming to a curated dataset for supervised classification can be costly. Semi-supervised classification offers a wide range of methods for leveraging unlabeled data when learning prediction models. However, these classifiers are commonly deep or ensemble learning structures that often result in black boxes. The requirement of interpretable models for medical settings led us to propose the self-labeling grey-box classifier, which outperforms other semi-supervised classifiers on benchmarking datasets while providing interpretability. In this chapter, we illustrate the applications of the self-labeling grey-box on the omics and clinical datasets from the cancer genome atlas. We show that the self-labeling grey-box is accurate in predicting cancer stages of rare cancers by leveraging the unlabeled instances from more common cancer types. We discuss insights, the features influencing prediction, as well as a global representation of the knowledge through decision trees or rule lists, which can aid clinicians and researchers.
|Title of host publication||Machine Learning, Big Data, and IoT for Medical Informatics|
|Number of pages||19|
|Publication status||Accepted/In press - 2021|