The paper presents an original filter approach for effective feature selection in microarray data characterized by a large number of input variables and a few samples. The approach is based on the use of a new information-theoretic selection, the double input symmetrical relevance (DISR), which relies on a measure of variable complementarity. This measure evaluates the additional information that a set of variables provides about the output with respect to the sum of each single variable contribution. We show that a variable selection approach based on DISR can be formulated as a quadratic optimization problem: the dispersion sum problem (DSP). To solve this problem, we use a strategy based on backward elimination and sequential replacement (BESR). The combination of BESR and the DISR criterion is compared in theoretical and experimental terms to recently proposed information-theoretic criteria. Experimental results on a synthetic dataset as well as on a set of eleven microarray classification tasks show that the proposed technique is competitive with existing filter selection methods.
|Number of pages||14|
|Journal||IEEE Journal of Selected Topics in Signal Processing|
|Publication status||Published - 2008|
- feature extraction
- filtering theory
- quadratic programming
- signal classification