Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets. / Rasmussen, Maria H.; Duan, Chenru; Kulik, Heather J.; Jensen, Jan H.

I: Journal of Cheminformatics, Bind 15, Nr. 1, 121, 2023.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Rasmussen, MH, Duan, C, Kulik, HJ & Jensen, JH 2023, 'Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets', Journal of Cheminformatics, bind 15, nr. 1, 121. https://doi.org/10.1186/s13321-023-00790-0

APA

Rasmussen, M. H., Duan, C., Kulik, H. J., & Jensen, J. H. (2023). Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets. Journal of Cheminformatics, 15(1), [121]. https://doi.org/10.1186/s13321-023-00790-0

Vancouver

Rasmussen MH, Duan C, Kulik HJ, Jensen JH. Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets. Journal of Cheminformatics. 2023;15(1). 121. https://doi.org/10.1186/s13321-023-00790-0

Author

Rasmussen, Maria H. ; Duan, Chenru ; Kulik, Heather J. ; Jensen, Jan H. / Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets. I: Journal of Cheminformatics. 2023 ; Bind 15, Nr. 1.

Bibtex

@article{1248341e8d754fd1b026db3e9d2251a2,
title = "Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets",
abstract = "With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman{\textquoteright}s rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman{\textquoteright}s rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman{\textquoteright}s rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65).",
author = "Rasmussen, {Maria H.} and Chenru Duan and Kulik, {Heather J.} and Jensen, {Jan H.}",
note = "Funding Information: Open access funding provided by Copenhagen University. This work was supported by the Novo Nordisk Foundation (MHR and JHJ) and by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing, Office of Basic Energy Sciences, via the Scientific Discovery through Advanced Computing (SciDAC) program (CD and HJK). Publisher Copyright: {\textcopyright} 2023, The Author(s).",
year = "2023",
doi = "10.1186/s13321-023-00790-0",
language = "English",
volume = "15",
journal = "Journal of Cheminformatics",
issn = "1758-2946",
publisher = "Springer",
number = "1",

}

RIS

TY - JOUR

T1 - Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets

AU - Rasmussen, Maria H.

AU - Duan, Chenru

AU - Kulik, Heather J.

AU - Jensen, Jan H.

N1 - Funding Information: Open access funding provided by Copenhagen University. This work was supported by the Novo Nordisk Foundation (MHR and JHJ) and by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing, Office of Basic Energy Sciences, via the Scientific Discovery through Advanced Computing (SciDAC) program (CD and HJK). Publisher Copyright: © 2023, The Author(s).

PY - 2023

Y1 - 2023

N2 - With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman’s rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman’s rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman’s rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65).

AB - With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different metrics to evaluate them. We compare three of the most popular validation metrics (Spearman’s rank correlation coefficient, the negative log likelihood (NLL) and the miscalibration area) to the error-based calibration introduced by Levi et al. (Sensors 2022, 22, 5540). Importantly, metrics such as the negative log likelihood (NLL) and Spearman’s rank correlation coefficient bear little information in themselves. We therefore introduce reference values obtained through errors simulated directly from the uncertainty distribution. The different metrics target different properties and we show how to interpret them, but we generally find the best overall validation to be done based on the error-based calibration plot introduced by Levi et al. Finally, we illustrate the sensitivity of ranking-based methods (e.g. Spearman’s rank correlation coefficient) towards test set design by using the same toy model ferent test sets and obtaining vastly different metrics (0.05 vs. 0.65).

U2 - 10.1186/s13321-023-00790-0

DO - 10.1186/s13321-023-00790-0

M3 - Journal article

C2 - 38111020

AN - SCOPUS:85179971165

VL - 15

JO - Journal of Cheminformatics

JF - Journal of Cheminformatics

SN - 1758-2946

IS - 1

M1 - 121

ER -

ID: 377816307