RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London. / Bogaert, Martin; Mouritzen, Christian; Johnson, Matthew S.; van Reeuwijk, Maarten.

I: Science of the Total Environment, Bind 925, 171522, 2024.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Bogaert, M, Mouritzen, C, Johnson, MS & van Reeuwijk, M 2024, 'RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London', Science of the Total Environment, bind 925, 171522. https://doi.org/10.1016/j.scitotenv.2024.171522

APA

Bogaert, M., Mouritzen, C., Johnson, M. S., & van Reeuwijk, M. (2024). RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London. Science of the Total Environment, 925, [171522]. https://doi.org/10.1016/j.scitotenv.2024.171522

Vancouver

Bogaert M, Mouritzen C, Johnson MS, van Reeuwijk M. RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London. Science of the Total Environment. 2024;925. 171522. https://doi.org/10.1016/j.scitotenv.2024.171522

Author

Bogaert, Martin ; Mouritzen, Christian ; Johnson, Matthew S. ; van Reeuwijk, Maarten. / RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London. I: Science of the Total Environment. 2024 ; Bind 925.

Bibtex

@article{70d9636dd0e745d19d3e568b608031db,
title = "RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London",
abstract = "High-density low-cost air quality sensor networks are a promising technology to monitor air quality at high temporal and spatial resolution. However the collected data is high-dimensional and it is not always clear how to best leverage this information, particularly given the lower data quality coming from the sensors. Here we report on the use of robust Principal Component Analysis (RPCA) using nitrogen dioxide data obtained from a recently deployed dense network of 225 air pollution monitoring nodes based on low-cost sensors in the Borough of Camden in London. RPCA addresses the brittleness of singular value decomposition towards outliers by using a decomposition of the data into low-rank and sparse contributions, with the latter containing outliers. The modal decomposition enabled by RPCA identifies major periodic patterns including spatial and temporal bias, dominant spatial variance, and north-south bias. The five most descriptive components capture 98 % of the data's variance, achieving a compression by a factor of 1500. We present a new technique that uses the sparse part of the data to identify hotspots. The data indicates that at the locations of the top 15 % most susceptible nodes in the network, the model identifies 23 % more hotspots than in all other locations combined. Moreover, the median hotspot event at these at-risk locations exceeds the mean NO2concentration by 33μg/m3. We show the potential of RPCA for signal correction; it corrects random errors yielding a reference signal with R2>0.8. Moreover, RPCA successfully reconstructs missing data from a sensor with R2=0.72 from the rest of the sensor network, an improvement upon PCA of around 50 %, allowing air quality estimations even if a sensor is out of use temporarily.",
keywords = "Air Quality Monitoring, Hotspot identification and signal correction, Low-Cost Sensor Networks, Robust Principal Component Analysis (RPCA), Spatial and Temporal Patterns in Air Pollution",
author = "Martin Bogaert and Christian Mouritzen and Johnson, {Matthew S.} and {van Reeuwijk}, Maarten",
note = "Funding Information: The authors would like to thank Airscape for providing the data. MvR acknowledges support from the Natural Environment Research Council (NERC) air quality Future Urban Ventilation Network (NE/V002082/1). MvR would like to thank Prof. Ben Barratt for valuable feedback on an early version of this paper. Publisher Copyright: {\textcopyright} 2024 The Authors",
year = "2024",
doi = "10.1016/j.scitotenv.2024.171522",
language = "English",
volume = "925",
journal = "Science of the Total Environment",
issn = "0048-9697",
publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO2 sensors in London

AU - Bogaert, Martin

AU - Mouritzen, Christian

AU - Johnson, Matthew S.

AU - van Reeuwijk, Maarten

N1 - Funding Information: The authors would like to thank Airscape for providing the data. MvR acknowledges support from the Natural Environment Research Council (NERC) air quality Future Urban Ventilation Network (NE/V002082/1). MvR would like to thank Prof. Ben Barratt for valuable feedback on an early version of this paper. Publisher Copyright: © 2024 The Authors

PY - 2024

Y1 - 2024

N2 - High-density low-cost air quality sensor networks are a promising technology to monitor air quality at high temporal and spatial resolution. However the collected data is high-dimensional and it is not always clear how to best leverage this information, particularly given the lower data quality coming from the sensors. Here we report on the use of robust Principal Component Analysis (RPCA) using nitrogen dioxide data obtained from a recently deployed dense network of 225 air pollution monitoring nodes based on low-cost sensors in the Borough of Camden in London. RPCA addresses the brittleness of singular value decomposition towards outliers by using a decomposition of the data into low-rank and sparse contributions, with the latter containing outliers. The modal decomposition enabled by RPCA identifies major periodic patterns including spatial and temporal bias, dominant spatial variance, and north-south bias. The five most descriptive components capture 98 % of the data's variance, achieving a compression by a factor of 1500. We present a new technique that uses the sparse part of the data to identify hotspots. The data indicates that at the locations of the top 15 % most susceptible nodes in the network, the model identifies 23 % more hotspots than in all other locations combined. Moreover, the median hotspot event at these at-risk locations exceeds the mean NO2concentration by 33μg/m3. We show the potential of RPCA for signal correction; it corrects random errors yielding a reference signal with R2>0.8. Moreover, RPCA successfully reconstructs missing data from a sensor with R2=0.72 from the rest of the sensor network, an improvement upon PCA of around 50 %, allowing air quality estimations even if a sensor is out of use temporarily.

AB - High-density low-cost air quality sensor networks are a promising technology to monitor air quality at high temporal and spatial resolution. However the collected data is high-dimensional and it is not always clear how to best leverage this information, particularly given the lower data quality coming from the sensors. Here we report on the use of robust Principal Component Analysis (RPCA) using nitrogen dioxide data obtained from a recently deployed dense network of 225 air pollution monitoring nodes based on low-cost sensors in the Borough of Camden in London. RPCA addresses the brittleness of singular value decomposition towards outliers by using a decomposition of the data into low-rank and sparse contributions, with the latter containing outliers. The modal decomposition enabled by RPCA identifies major periodic patterns including spatial and temporal bias, dominant spatial variance, and north-south bias. The five most descriptive components capture 98 % of the data's variance, achieving a compression by a factor of 1500. We present a new technique that uses the sparse part of the data to identify hotspots. The data indicates that at the locations of the top 15 % most susceptible nodes in the network, the model identifies 23 % more hotspots than in all other locations combined. Moreover, the median hotspot event at these at-risk locations exceeds the mean NO2concentration by 33μg/m3. We show the potential of RPCA for signal correction; it corrects random errors yielding a reference signal with R2>0.8. Moreover, RPCA successfully reconstructs missing data from a sensor with R2=0.72 from the rest of the sensor network, an improvement upon PCA of around 50 %, allowing air quality estimations even if a sensor is out of use temporarily.

KW - Air Quality Monitoring

KW - Hotspot identification and signal correction

KW - Low-Cost Sensor Networks

KW - Robust Principal Component Analysis (RPCA)

KW - Spatial and Temporal Patterns in Air Pollution

U2 - 10.1016/j.scitotenv.2024.171522

DO - 10.1016/j.scitotenv.2024.171522

M3 - Journal article

C2 - 38494021

AN - SCOPUS:85188111503

VL - 925

JO - Science of the Total Environment

JF - Science of the Total Environment

SN - 0048-9697

M1 - 171522

ER -

ID: 389079852