Abstract
Researching the optimization of patient care across all stages of the care continuum is crucial. Rectal cancer remains a very deadly disease, often causing major discomfort and decreasing the patient’s quality of life (QOL) due to invasive surgery. Therefore, it is important to develop a prediction algorithm capable of predicting the tumor regression grade (TRG) before the start of any therapy. The TRG is typically determined by microscopic analysis of a biopsy obtained during surgery. The TRG used for rectal cancer patients at our hospital is the Dworak TRG. Since this system is a semi-quantitative grading system, there exists some subjectivity and variability in it. Consequently, we reclassified Dworak grades 0, 1, and 2 as bad responders, while grades 3 and 4 were categorized as good responders, giving us a binary TRG classification problem.
We introduced a novel customized Random Forest (RF) algorithm to predict the binary TRG using radiomics extracted from the planning CT’s. A total of 111 radiomic features were extracted using the open-source package PyRadiomics. Our proposed algorithm, called the Evolutionary Random Subspace Forest (ERSF), builds upon the algorithms developed by Ho and Breiman. We used subspaces for variable selection and pruned trees iteratively. Additionally, instead of utilizing traditional classification trees, we opted for linear discriminant analysis (LDA) trees. Our ERSF gave a proportional accuracy of 76.0% for the training and 65.4% for the validation.
Given the subjective nature of the Dworak, its impact on the prediction accuracy of ERSF cannot be overlooked. Moreover, we believe that understanding the relationship between the forest and expert opinions is extremely important. Thus, we revisited the surgical notes and identified patients whose grade (bad or good responder) was ambiguous, labeling them as ”grey-zone patients”. Our analysis revealed that the algorithm encountered greater difficulty in predicting outcomes for grey-zone patients, achieving only 63.5% proportional accuracy. In contrast, for non-grey zone patients, the proportional accuracy was notably higher at 92.5%. These findings
underscore the influence of TRG subjectivity on the algorithm’s misclassifications.
Additionally, we extracted Fourier features from the same CT images and fed them into the ERSF. The results were mediocre, giving a proportional accuracy of 82.4% for the training and 59.1% for the validation. Given the retrospective nature of our data, these results were somewhat expected. The data set contains variations in pixel spacing settings, which could potentially lead to prediction challenges. To address this issue, we experimented with several uniformizations of the pixel settings. We obtained encouraging results for some, with the most successful configuration giving a proportional accuracy of 90.6% for the training and 72.7% for the validation using Fourier data.
Finally, we extracted prior information from the Fourier data and constructed a new ERSF using both radiomics and the Fourier prior information in the LDA. The best results gave a proportional accuracy of 73.8% for the training and 72.7% for the validation, surpassing the initial radiomics results.
We introduced a novel customized Random Forest (RF) algorithm to predict the binary TRG using radiomics extracted from the planning CT’s. A total of 111 radiomic features were extracted using the open-source package PyRadiomics. Our proposed algorithm, called the Evolutionary Random Subspace Forest (ERSF), builds upon the algorithms developed by Ho and Breiman. We used subspaces for variable selection and pruned trees iteratively. Additionally, instead of utilizing traditional classification trees, we opted for linear discriminant analysis (LDA) trees. Our ERSF gave a proportional accuracy of 76.0% for the training and 65.4% for the validation.
Given the subjective nature of the Dworak, its impact on the prediction accuracy of ERSF cannot be overlooked. Moreover, we believe that understanding the relationship between the forest and expert opinions is extremely important. Thus, we revisited the surgical notes and identified patients whose grade (bad or good responder) was ambiguous, labeling them as ”grey-zone patients”. Our analysis revealed that the algorithm encountered greater difficulty in predicting outcomes for grey-zone patients, achieving only 63.5% proportional accuracy. In contrast, for non-grey zone patients, the proportional accuracy was notably higher at 92.5%. These findings
underscore the influence of TRG subjectivity on the algorithm’s misclassifications.
Additionally, we extracted Fourier features from the same CT images and fed them into the ERSF. The results were mediocre, giving a proportional accuracy of 82.4% for the training and 59.1% for the validation. Given the retrospective nature of our data, these results were somewhat expected. The data set contains variations in pixel spacing settings, which could potentially lead to prediction challenges. To address this issue, we experimented with several uniformizations of the pixel settings. We obtained encouraging results for some, with the most successful configuration giving a proportional accuracy of 90.6% for the training and 72.7% for the validation using Fourier data.
Finally, we extracted prior information from the Fourier data and constructed a new ERSF using both radiomics and the Fourier prior information in the LDA. The best results gave a proportional accuracy of 73.8% for the training and 72.7% for the validation, surpassing the initial radiomics results.
Original language | English |
---|---|
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 20 Jan 2025 |
Publication status | Published - 2025 |