Abstract
Noise, arising from physical sources such as thermal fluctuations or environmental causes such as small temperature changes or building vibrations, is inevitable in chromatograms. Excessive noise hampers subsequent data analysis, whereby derivative-based peak detection algorithms will suffer the most, thereby hindering the accurate detection and quantification of analytes. Currently applied smoothing methods such as the Savitzky-Golay filtering rely on input of the operator for the choice of appropriate parameter values, which makes it inconvenient for automation. It can furthermore suffer from significant information loss and inaccuracies in some cases.
In this work, two machine-learning related methodologies, namely natural cubic splines, and k-nearest neighbours, are applied as smoothing algorithms [1, 2]. Both methods smoothen the noisy chromatogram by modelling the experimental chromatogram, thereby retaining as much of the desired information as possible and removing the noise. The natural cubic splines method divides the chromatogram into equal sized segments known as knots and approximates these as a set of third-degree polynomials where the second derivative at the endpoint of each knot is set to zero to ensure smoothness throughout the modelled curve. K-nearest neighbours models the underlying
chromatogram by averaging the noisy curve out using the k nearest neighbours of each datapoint. In this work, the algorithms are applied on segmented chromatograms, whereby a chromatogram is divided into peak and inter-peak
areas. Optimal model parameters (number of knots/ number of neighbours) are determined on each segment using the Durbin-Watson criterion [3, 4]. Finally, smoothness between consecutive segments is enforced using polynomial
regression linking the last datapoints of one segment with the first datapoints of the following segment.
It is shown that both methods perform significantly better than the Savitzky-Golay filtering on both simulated and experimental chromatography data. Both approaches are simple, fast, and fully automated smoothing methods as
they do not require any input of the operator.
[1] Carl de Boor: A Practical Guide to Splines. Springer Verlag Berlin, 2001
[2] N. S. Altman. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician. 1992; 46:3, 175-185, doi: 10.1080/00031305.1992.10475879
[3] Vivó-Truyols G, Torres-Lapasió JR, van Nederkassel AM, Vander Heyden Y, Massart DL. Automatic program for peak detection and deconvolution of multi-overlapped chromatographic signals part I: peak detection. J Chromatogr A.
2005 Nov 25;1096(1-2):133-45. doi: 10.1016/j.chroma.2005.03.092. PMID: 16301076.
[4] Durbin, J., and G. S. Watson. Testing for Serial Correlation in Least Squares Regression. II. Biometrika, vol. 38, no. 1/2, [Oxford University Press, Biometrika Trust], 1951, pp. 159–77, doi:10.2307/2332325
In this work, two machine-learning related methodologies, namely natural cubic splines, and k-nearest neighbours, are applied as smoothing algorithms [1, 2]. Both methods smoothen the noisy chromatogram by modelling the experimental chromatogram, thereby retaining as much of the desired information as possible and removing the noise. The natural cubic splines method divides the chromatogram into equal sized segments known as knots and approximates these as a set of third-degree polynomials where the second derivative at the endpoint of each knot is set to zero to ensure smoothness throughout the modelled curve. K-nearest neighbours models the underlying
chromatogram by averaging the noisy curve out using the k nearest neighbours of each datapoint. In this work, the algorithms are applied on segmented chromatograms, whereby a chromatogram is divided into peak and inter-peak
areas. Optimal model parameters (number of knots/ number of neighbours) are determined on each segment using the Durbin-Watson criterion [3, 4]. Finally, smoothness between consecutive segments is enforced using polynomial
regression linking the last datapoints of one segment with the first datapoints of the following segment.
It is shown that both methods perform significantly better than the Savitzky-Golay filtering on both simulated and experimental chromatography data. Both approaches are simple, fast, and fully automated smoothing methods as
they do not require any input of the operator.
[1] Carl de Boor: A Practical Guide to Splines. Springer Verlag Berlin, 2001
[2] N. S. Altman. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician. 1992; 46:3, 175-185, doi: 10.1080/00031305.1992.10475879
[3] Vivó-Truyols G, Torres-Lapasió JR, van Nederkassel AM, Vander Heyden Y, Massart DL. Automatic program for peak detection and deconvolution of multi-overlapped chromatographic signals part I: peak detection. J Chromatogr A.
2005 Nov 25;1096(1-2):133-45. doi: 10.1016/j.chroma.2005.03.092. PMID: 16301076.
[4] Durbin, J., and G. S. Watson. Testing for Serial Correlation in Least Squares Regression. II. Biometrika, vol. 38, no. 1/2, [Oxford University Press, Biometrika Trust], 1951, pp. 159–77, doi:10.2307/2332325
| Original language | English |
|---|---|
| Publication status | Published - May 2022 |
| Event | 17th International Symposium on Hyphenated Techniques in Chromatography and Separation Technology - Ghent, Belgium Duration: 18 May 2022 → 20 May 2022 Conference number: 17 https://htc-17.com/ |
Conference
| Conference | 17th International Symposium on Hyphenated Techniques in Chromatography and Separation Technology |
|---|---|
| Abbreviated title | HTC-17 |
| Country/Territory | Belgium |
| City | Ghent |
| Period | 18/05/22 → 20/05/22 |
| Internet address |
Keywords
- Machine learning
- HPLC
- smoothening
- denoising