Project Details
Description
Analytical methods produce always larger data sets. The project studies chemometrical methods to extract information from them. The four main themes are :
i) the detection of aberrant objects (outliers). Outliers can falsify conclusions and lead to non-robust models. The question is how to detect outliers or how one can develop robust methods which are not influenced by the presence of outliers
ii) selection or elimination of variables : for the interpretation of the information, for economical reasons and sometimes also for the quality of the model it is often necessary to select a limited number of representative variables from the large amount of measured variables. For instance not all wavelengths in a spectrum contain interesting information. Selection of the important variables can lead to simpler instruments, easier interpretable models and elimination of noise
iii) most methods for data analysis are based on correlation or distances. Models based on density, i.e. finding those areas in the data in which many objects (or very little objects) appear, are possible but they are less investigated although they are very suitable to find representative objects or to find objects (like outliers) which have special properties)
iv) many algorithms for data analysis become difficult to handle when very large sets are used. In some cases algorithms which are not efficient for a small number of objects become efficient when applied to big data sets. There is a need for algorithms that can cope with huge data sets.
i) the detection of aberrant objects (outliers). Outliers can falsify conclusions and lead to non-robust models. The question is how to detect outliers or how one can develop robust methods which are not influenced by the presence of outliers
ii) selection or elimination of variables : for the interpretation of the information, for economical reasons and sometimes also for the quality of the model it is often necessary to select a limited number of representative variables from the large amount of measured variables. For instance not all wavelengths in a spectrum contain interesting information. Selection of the important variables can lead to simpler instruments, easier interpretable models and elimination of noise
iii) most methods for data analysis are based on correlation or distances. Models based on density, i.e. finding those areas in the data in which many objects (or very little objects) appear, are possible but they are less investigated although they are very suitable to find representative objects or to find objects (like outliers) which have special properties)
iv) many algorithms for data analysis become difficult to handle when very large sets are used. In some cases algorithms which are not efficient for a small number of objects become efficient when applied to big data sets. There is a need for algorithms that can cope with huge data sets.
Acronym | FWOAL214 |
---|---|
Status | Finished |
Effective start/end date | 1/01/02 → 31/12/05 |
Keywords
- Chemometrics
Flemish discipline codes in use since 2023
- Pharmaceutical sciences
- Chemical sciences
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.