Towards Automated Structure Analysis of Nanoparticles

Publikation: Bog/antologi/afhandling/rapportPh.d.-afhandlingForskning

Current crystallographic methodologies, despite being critical for structure characterisation in materials chemistry, struggle with nanomaterials due to their limited long-range order. In this dissertation, I therefore explore alternative methodologies, namely total scattering (TS) with pair distribution function (PDF) analysis, and small-angle X-ray scattering (SAXS), leveraging the Debye scattering equation to calculate the scattering pattern from a structural model (Chapter 1+2). It is demonstrated how scattering data conventionally is modelled by performing a least-square refinement of the simulated and experimental data. This process is widely known as structure refinement. While structure refinement is effective in providing structural information from experimental scattering data, this conventional data modelling approach rely on a structural model as input. It therefore requires time-consuming literature reviews, followed by identification of the structural model in structural databases. Modelling methods have been developed that automate structural model extraction from scattering data by refinement of hundreds of thousands of structures. However, these methods become computationally demanding (Chapter 2.1).

Instead, supervised machine learning (ML) can potentially accelerate data analysis by predicting the required structural model for conventional structure refinements (Chapter 3). Examples are shown where supervised ML instantaneously (<1 sec) identifies a polyoxometalate cluster from a structural database (Chapter 3.1) and where explainable supervised ML is used to accelerate the motif extraction with many orders of magnitude (Chapter 3.2); in both examples from scattering data.

However, supervised ML faces three main challenges: 1) handling data with contributions from multiple chemical components. 2) handling data from structures not present in the training database and 3) accounting for experimental data that contains signals not included in the simulated data (Chapter 4).

Challenge 1 is the most well-studied of these three challenges for scattering data and it is therefore only briefly covered in this dissertation. Instead, challenges 2 and 3, are addressed using generative ML models which is a subfield of unsupervised ML. Initial results for a subset of chemical systems show promise for a broader use for scattering data analysis. Specifically, issue 2 is addressed by using generative ML models to solve mono-metallic nanoparticles with less than 200 atoms from PDF data.

Here, the generative ML model demonstrates considerable potential in interpolating between structures (Chapter 4.1). Issue 3 is addressed by using generative ML models to learn the distribution of simulated and experimental data and translate between these two (Chapter 4.2).

Although the current examples mainly analyse scattering data, the developed modelling methodology is intended to be general for scattering and spectroscopy data from techniques where a structural model can describe the experimental data through structure refinement.

The dissertation concludes with a key message: It is essential to make these tools accessible to the materials chemist!

Making the ML tools more accessible to the materials chemist requires greater transparency in the publication of research data, code, and software which can facilitate other researchers’ application of trained ML models to their own experimental data. Equally important is the ability of these ML models to indicate the potential risk of failure when applied to a specific dataset. A clear understanding of the models' limitations helps to prevent misuse, misinterpretations, or inaccurate results. Moreover, there is a need for ML models that are interpretable and explainable, enhancing the understanding of prediction mechanisms and promoting scientific insight (Chapter 5).
OriginalsprogEngelsk
ForlagDepartment of Chemistry, Faculty of Science, University of Copenhagen
Antal sider266
StatusUdgivet - 2023

ID: 377062674