Pilot 14 — dMRV Solution for E. coli estimation under the Gold Standard SDWS methodology
This report presents Virridy's evidence that the Lume sensor can replace periodic Compartment Bag Test (CBT) sampling as the primary water-quality monitoring method under the Safe Drinking Water Supply (SDWS) methodology — specifically for Parameter 18 (Microbial Drinking Water Quality). Where a CBT gives a single snapshot per site visit, a permanently installed Lume sensor provides continuous, autonomous E. coli estimation at every check-in interval, generating orders of magnitude more data at lower marginal cost with no enumerator, no incubation, and no manual data entry.
The core evidence for substitution is straightforward: the Lume sensor agrees with CBT at least as well as two accepted laboratory methods agree with each other. On 153 three-way split samples (Lume, Colilert, Membrane Filtration), the Lume↔Colilert agreement was κ=0.88 (“almost perfect”) while the Colilert↔MF agreement was only κ=0.40 (“fair”). In the field deployment, the CBT-trained Tobit model achieves 88% leave-one-out cross-validation agreement, 88% balanced accuracy (AUC=0.92) at the ≥10 CFU/100 mL contamination threshold, and 95% WHO risk-tier agreement — validated on 176 paired Lume–CBT field samples from 3 sensors across 52 sampling points in two countries. The entire model is 5 published coefficients (Appendix B) that any third party can verify with a calculator. The complete dataset, methodology, and live results are published at validation.thelume.ai/cbt.
The field dataset spans two independent water programs in two countries: Amazi Meza in Rwanda (school-based water treatment, ~600,000 students, Gold Standard GS12240) and DRIP FUNDI in Kenya (USAID drought resilience, ~120,000 people, 200 boreholes). The programs differ in climate (highland vs. arid), water sources (springs and rainwater vs. boreholes and kiosks), treatment technology (ceramic filtration vs. chlorination), and institutional setting (schools vs. community water points). The model achieves 86–89% per-sensor agreement across both contexts without country-specific tuning. This cross-sectional validation provides confidence that the sensor generalises across the range of water systems it will monitor in production.
The substitution does not eliminate manual sampling — it changes its role. CBTs shift from the primary monitoring method to a periodic cross-check that validates sensor accuracy on an ongoing basis. The integration protocol (§3.1) defines when cross-checks occur, how sensor and CBT data are paired, and how discrepancies are detected and resolved. This protocol is operational and has processed the full Phase 1 dataset.
This report addresses the four Forward Action Requirements (FARs) raised at Gold Standard Pilot 14 approval (3 September 2025), incorporating Phase 1 field validation data from Rwanda (Amazi Meza) and Kenya (DRIP FUNDI), alongside US-based validation studies (Boulder Creek CO, Seine River FR, Yampa River CO).
For this dMRV implementation, Virridy has elected to deploy only transparent linear regression models for E. coli estimation — specifically, a linear regression with a single mon2×temperature interaction term (CFU regression) and a logistic regression for binary risk classification. No AI, machine-learning, or gradient-boosted ensemble model is used in the deployed verification pipeline. The exhaustive published coefficients in Appendix B are the entire model. While the academic literature explores ML approaches and Virridy's earlier research and patents include adaptive-learning techniques, the choice for verifier-facing operation is a fully auditable closed-form regression that any third party can reproduce with a calculator.
| FAR | Requirement | Status | Primary Evidence |
|---|---|---|---|
| FAR 1 | Sensor Validation & Calibration Protocols | Resolved | Calibration protocol documented (§1.3); drift monitoring operational (§1.4); 6,711 sensor observations across 176 paired Lume–CBT field samples from 3 sensors in 2 countries — 88% LOOCV, per-sensor 86–89%; US lab baselines: n=209 Colilert (R²=0.881), n=303 MF (R²=0.872) |
| FAR 2 | AI/ML Implementation & Validation | Resolved | Resolved by design: deployed model is a CBT-trained Tobit regression (no AI/ML) — 5 published coefficients (Appendix B), training data provenance (§2.2), cross-validation (§2.3: 88% LOOCV, 87% balanced accuracy), retraining & version-control procedures (§2.4). Every element of the original requirement is documented; the model exceeds the transparency standard since the entire pipeline is reproducible by hand. |
| FAR 3 | Manual ↔ Digital Integration Protocol | Resolved | Protocol established and exercised: 176 paired Lume–CBT samples processed through automated pairing, exclusion, and discrepancy detection pipeline at validation.thelume.ai/cbt; protocol documented in §3.1 |
| FAR 4 | SDWS 23 & 27 Exploration | Resolved | Exploration complete: flow-state classifier validated on 1,599 bench data points across two test setups (Closed Pipe Flow 95.3% / Bucket 90.7%, κ≥0.85); both parameters recommended for inclusion; full analysis at validation.thelume.ai/pipedflow/. Field deployment deferred to separate Phase 2 project. |
FAR 1 is resolved. The original requirement called for detailed protocols covering calibration check frequency, drift thresholds, and sensor replacement procedures. All three are documented (§1.3, §1.4) and operational. The field validation dataset comprises 6,711 individual sensor observations across 176 paired Lume–CBT samples from 3 sensors deployed in Rwanda and Kenya (May–June 2026). Each paired sample draws on an average of 38 sensor readings within the ±20-minute observation window, all at the calibrated operating point. The Implementation Plan estimated 250–350 paired samples as the minimum to reach target performance; that performance level — 88% LOOCV agreement, per-sensor ≥86% — was achieved with 176 pairs. The validation objective is met. FAR 2 is resolved. The requirement called for full model documentation; the deployed Tobit regression is fully specified by 5 published coefficients with complete training-data provenance, cross-validation results, and version-control procedures (§2.1–2.4). FAR 3 is resolved. The requirement called for a clear protocol integrating manual water quality sampling with digital Lume sensor data; the operational pipeline at validation.thelume.ai/cbt defines when manual sampling occurs, automates pairing, and specifies discrepancy detection and resolution (§3.1–3.3). The protocol has processed 176 paired observations from 3 sensors in 2 countries. FAR 4 is resolved. The requirement asked Virridy to “explore the applicability” of SDWS 23 and 27 and provide a rationale for inclusion or exclusion. The exploration is complete: a flow-state classifier validated on 1,599 bench data points across two test setups achieves 93% combined accuracy (κ=0.85), and both parameters are recommended for inclusion (§4.1–4.8). Full evidence at validation.thelume.ai/pipedflow/. Field deployment and site-level calibration will be conducted under the separate Phase 2 project.
This report covers Phase 1 mobile validation, which is complete with 176 paired Lume–CBT field samples from Rwanda and Kenya. All four Forward Action Requirements raised at pilot approval are resolved. Phase 2 permanent installation at Amazi Meza institutional sites will be conducted as a separate project and validation effort with its own work plan, timeline, and reporting.
The approved Implementation Plan describes two phases. This report covers Phase 1.
Model architecture change: The approved Implementation Plan described the deployed E. coli estimation model as a gradient-boosted decision tree ensemble. Virridy has instead deployed a transparent right-censored linear regression (Tobit model) with 5 published coefficients. This architectural change was made in April 2026 to maximise auditability and reproducibility for the dMRV verification pipeline. Gold Standard was informally notified during the pilot process; formal notification is pending. No other scope, schedule, or methodology deviations have been submitted.
| Cohort | Sensors deployed | Sites | Active since | Data points (cumulative) |
|---|---|---|---|---|
| US validation (Boulder Creek) | 3 | BC-CU, BC-55, BC-Can | April 2026 | ~50,000+ continuous readings |
| US bench / lab (multi-sensor) | 10+ | Lab fixture, Yampa, Seine | 2022 → present | n=512 paired lab samples (combined Colilert + MF) |
| Rwanda Amazi Meza — Phase 1 | 3 (50045, 50053, 50065) | EP Nyakabungo, EP Nyakabuye, EP Rwishwima, Kicukiro, Kamonyi (RW); Isiolo, Turkana (KE) | May 2026 | 176 paired Lume–CBT points |
| Phase 2 — Permanent Installation | Separate project and validation effort. See §Phase 2. | |||
The project developer must provide detailed protocols for sensor validation and calibration, including frequency of calibration checks, acceptable drift thresholds, and procedures for replacing or recalibrating sensors that fall outside tolerance.
The Lume sensor's measurement performance against laboratory reference methods has been characterised across multiple independent studies. The two relevant gold-standard methods are Colilert (IDEXX defined-substrate technology) and membrane filtration (MF, US EPA Method 1604). Full source: thelume.ai/research.
| Reference method | Paired n | R² | Binary accuracy at 10 CFU/100 mL | Cohen's κ | Source |
|---|---|---|---|---|---|
| Colilert (IDEXX) | 209 | 0.881 | 0.92 (balanced 0.92) | 0.84 | Knopp et al. (2026); thelume.ai/research |
| Membrane Filtration (EPA 1604) | 303 | 0.872 | — | — | MF-trained model, method-agnostic validation |
| Three-way (Lume / Colilert / MF) | 153 | — | Lume κ=0.88 vs. Colilert | Colilert↔MF κ=0.40 | Method-comparison subset |
Notable: the Lume↔Colilert agreement (κ=0.88, almost perfect) is substantially stronger than the Colilert↔MF agreement (κ=0.40, fair) on the same n=153 split-sample set. This indicates the Lume's reproducibility against either reference method is on the order of, or better than, the inherent reproducibility between two accepted laboratory methods.
WHO-defined drinking-water risk bands (Low <1, Intermediate 1–10, High 10–100, Very High >100 CFU/100 mL):
| Risk band split | Threshold | Overall accuracy | Balanced accuracy | Cohen's κ |
|---|---|---|---|---|
| Safe vs. any contamination | 1 CFU/100 mL | 0.91 | 0.91 | 0.82 |
| WHO Low/Intermediate vs. High+ | 10 CFU/100 mL | 0.92 | 0.92 | 0.84 |
| 3-category (<10, 10–100, >100) | multi | 0.91 | 0.85 | 0.60 |
| Recreational binary (Seine R., held-out) | 900 CFU/100 mL | 0.968 | 0.94 | — |
The Lume sensor is calibrated at the operating point led_power = 512, sipm_bias ∈ [2960, 3040] (target 3000) — these are the parameters under which the deployed CFU regression and turbidity (NTU) regression were trained. Sensors falling outside this window automatically fall back to the (LED, bias) combo nearest the target via the Lume backend; readings from the fallback combo are flagged as “provisional” on operational dashboards.
| Calibration check | Frequency | Tolerance / Pass criterion | Action on failure |
|---|---|---|---|
| Operating-point combo (LED 512, bias ~3000) | Continuous (every reading) | bias ∈ [2960, 3040] | Fall back to nearest combo; dashboard flags “Provisional”; replace sensor if fallback persists > 7 days |
| Turbidity (ToF) zero-baseline check | Continuous (per-sensor 10th-%ile in-water) | Sensor-relative anomaly: NTU = max(0, 2.05 × (sps − baseline)) | Re-baseline automatically from rolling in-water minimum |
| Field paired CBT or Colilert grab-sample | Per institutional visit during Phase 1; at least quarterly during Phase 2 | Within 1 WHO risk band of Lume estimate | Investigate; flag period; re-train if systematic |
| Sensor swap / retirement | On detection of persistent fallback, low battery (<3.8 V steady), or repeated air-exposed flag | — | Replace in field; data continuity preserved via Blues check-in chain of custody |
The Lume Fleet Health dashboard (internal operations tool, requires login) tracks each sensor's:
176 paired Lume–CBT field samples from 3 sensors deployed across Rwanda (Amazi Meza) and Kenya (DRIP), paired within ±20 minutes. Each paired sample is backed by an average of 38 individual sensor readings within the observation window (6,711 total sensor observations, all at the calibrated operating point led=512, bias ∈ [2960, 3040]). The CBT-trained Tobit regression achieves:
| Metric | Value |
|---|---|
| Total sensor observations | 6,711 (across 176 paired CBT samples) |
| LOOCV agreement (±1 log10) | 88% (155/176) |
| Balanced accuracy (≥10 CFU) | 87% (sensitivity 88%, specificity 86%) |
| Per-sensor: 50045 (Rwanda) | 89% (17/19) |
| Per-sensor: 50053 (Kenya) | 86% (42/49) |
| Per-sensor: 50065 (both) | 89% (96/108) |
Live, continuously updated results: validation.thelume.ai/cbt
The FAR 1 requirement asked for “detailed protocols for sensor validation and calibration, including frequency of calibration checks, acceptable drift thresholds, and procedures for replacing or recalibrating sensors that fall outside tolerance.” Each element is addressed:
Full documentation of the AI/ML model used for E. coli estimation must be provided, including training data sources, model architecture, validation results, accuracy metrics, and procedures for model retraining and version control.
The original FAR was written under the assumption that an AI/ML model would be deployed. Virridy has elected not to deploy an AI/ML model for verification. Instead, the deployed pipeline uses a right-censored linear regression (Tobit model) with multiplicative temperature correction, fully specified by 5 published coefficients and reproducible by hand. There is no opaque model state, no black-box inference, no online learning, and no need for AI-specific governance such as adversarial testing or fairness auditing. The move from gradient-boosted decision trees (described in the approved Implementation Plan) to transparent linear regression was an intentional architectural choice for verifiability and auditability.
| Attribute | Value |
|---|---|
| Model family | Right-censored OLS (Tobit regression) with multiplicative temperature correction. No AI, no ML, no decision trees, no ensemble methods, no neural networks in the deployed pipeline. |
| Output (primary) | E. coli concentration (CFU/100 mL) via log10(CFU+1) prediction |
| Output (secondary) | Categorical risk class (WHO Low / Intermediate / High / Very High) |
| Input features | Temperature-corrected baseline-normalized fluorescence (mon2c_n), baseline-normalized turbidity proxy (tof_n), per-sensor fixed effects (2 sensors beyond reference) |
| Pre-processing | mon2c = mon2 × exp(−ρ·(T−20)) with pooled ρ = 0.0139 (R² = 0.946 from 101 clean-water samples); per-sensor baseline subtraction for both mon2c and ToF |
| Coefficients (5 total) | [1.386, 0.865, 0.393, −0.771, −0.619] — intercept, z(mon2c_n), z(tof_n), FE·50053, FE·50065 |
| σ̂ (Tobit) | 0.667 log10 |
| Right-censoring point | CBT detection limit at 100 CFU/100 mL (log10(101) ≈ 2.004) |
| Training data | 176 paired Lume–CBT field samples, 3 sensors, Rwanda + Kenya, May–June 2026 |
| Dataset | n | Reference method | Locations | Use |
|---|---|---|---|---|
| Lume v1.2 multi-site validation | ~512 paired (combined Colilert + MF) | Colilert / MF | US (Colorado: Boulder Creek, Yampa); France (Seine); historical Kenya, Malawi | Primary regression training + cross-validation |
| Bedell et al. (2022) Water Research | Published | Culture-based | Kenya groundwater (37 sites, Sorensen et al. 2018 cohort) | Foundational TLF↔E. coli relationship; 83% reported accuracy |
| Knopp et al. (2026) EarthArXiv | Published | Colilert + MF | Multi-site (US + France) | Lume v1.2 sensor design + multi-site validation results |
| Demaree et al. (2026) ES&T Water | Published | Colilert | Upper Yampa River, CO | Sensor-informed predictive models |
| Nowicki et al. (2020) | Published | Culture | Malawi | TLF reproducibility (14% RPD vs. ≥26% for culture) |
| CV scheme | Agreement | Balanced accuracy | Notes |
|---|---|---|---|
| LOOCV (full dataset, n=176) | 88% within ±1 log10 | — | Each point predicted by model trained on remaining 175; tournament selects best feature set |
| Binary ≥10 CFU/100 mL | — | 87% (sens=88%, spec=86%) | Contamination detection threshold |
| Binary ≥1 CFU/100 mL | — | 75% | Presence/absence threshold |
| Per-sensor: 50045 | 89% (17/19) | — | Rwanda (Amazi Meza) |
| Per-sensor: 50053 | 86% (42/49) | — | Kenya (DRIP) |
| Per-sensor: 50065 | 89% (96/108) | — | Both programs |
functions/js/ecoli-model.js.js in the Virridy code repository (model version 2026-04-27-turbidity-relative). The same coefficients are mirrored in the offline Lume desktop dashboard (src/model/e_coli.rs ACTIVE_MODEL); both copies must move together.MODEL_VERSION string. The shared model file is served with Cache-Control: no-store so every dashboard fetches the latest on every page load — no per-page cache busting required.SweetSenseInc/lume_desktop_dashboard repository on the pc-sandbox branch.The CBT-trained Tobit model was developed directly on Rwanda + Kenya field data (176 paired Lume–CBT samples from 3 sensors). It generalises across both programs with per-sensor agreement of 86–89%. The model card above and Appendix B reflect the deployed CBT model coefficients. Live validation at validation.thelume.ai/cbt updates continuously as new paired samples are added.
The FAR 2 requirement asked for “full documentation of the AI/ML model used for E. coli estimation, including training data sources, model architecture, validation results, accuracy metrics, and procedures for model retraining and version control.” Every element is addressed — and the architectural choice to deploy a transparent linear regression rather than an AI/ML model means the documentation standard is exceeded, not merely met:
| Required element | Where documented | Status |
|---|---|---|
| Training data sources | §2.2 — five provenance datasets, peer-reviewed publications | Complete |
| Model architecture | §2.1 — Tobit regression, 5 coefficients, no AI/ML. Architecture change from GBDT documented in Deviations section. | Complete |
| Validation results | §2.3 — LOOCV, per-sensor breakdowns, binary classifiers at multiple thresholds | Complete |
| Accuracy metrics | §2.3 — 88% LOOCV, 87% balanced accuracy, per-sensor 86–89% | Complete |
| Retraining procedures | §2.4 — trigger criteria (≥100 new samples from new geography, systematic residual bias, hardware revision) | Complete |
| Version control | §2.4 — date-stamped MODEL_VERSION, git-tracked coefficients, Cache-Control: no-store serving | Complete |
The original FAR assumed an opaque AI/ML model would be deployed, requiring governance measures such as adversarial testing and fairness auditing. By electing to deploy a transparent Tobit regression — where the entire model is 5 published coefficients reproducible with a calculator — Virridy has rendered these concerns inapplicable. Any third party can independently verify the model’s output from raw sensor readings using only the coefficients in Appendix B. Gold Standard has been informally notified of the architecture change; formal notification is an administrative follow-up and does not affect the completeness of the technical documentation.
A clear protocol must be established for integrating manual water quality sampling with the digital Lume sensor data. This should define when manual sampling is required as a complement or cross-check, and how discrepancies between manual and digital results are resolved.
The integration protocol is implemented as a live, automated pipeline at validation.thelume.ai/cbt. It has processed 176 paired Lume–CBT field samples from Rwanda and Kenya. The protocol operates as follows:
The US multi-site validation dataset (n=153 three-way split samples: Lume, Colilert, MF) establishes the practical floor for inter-method disagreement. On these same samples, the Colilert↔MF agreement was only κ=0.40 (“fair”), while the Lume↔Colilert agreement was κ=0.88 (“almost perfect”). This means a substantial share of any Lume↔CBT discrepancy in the field reflects inherent variability between microbial water tests, not sensor error.
For the Rwanda/Kenya Phase 1 dataset (n=176 paired Lume–CBT), the CBT-trained Tobit model achieves 88% LOOCV agreement within ±1 log10. Per-sensor agreement ranges from 86% (50053, Kenya) to 89% (50045, Rwanda; 50065, both). The full pair-by-pair comparison, including residual plots and per-sensor breakdowns, is available at validation.thelume.ai/cbt.
The FAR 3 requirement asked for “a clear protocol for integrating manual water quality sampling with the digital Lume sensor data… defining when manual sampling is required as a complement or cross-check, and how discrepancies between manual and digital results are resolved.” Every element is addressed:
| Required element | Where documented | Status |
|---|---|---|
| When manual sampling is required | §3.1 step 1 — CBT grab sample at every site visit, within ±20 min of sensor reading | Complete |
| Integration of manual & digital data | §3.1 steps 2–3 — automated pairing and exclusion pipeline at validation.thelume.ai/cbt | Complete |
| Discrepancy definition | §3.1 step 4 — >1 log10(CFU+1) threshold, the CBT inter-method precision | Complete |
| Discrepancy resolution | §3.1 step 4 — three resolution paths (sensor error, CBT error, ambiguous → duplicate) | Complete |
| Protocol exercised at scale | 176 paired observations from 3 sensors, 2 countries, 7+ sites — 88% LOOCV agreement | Complete |
The protocol is not a draft document — it is an operational, automated pipeline that has processed the full Phase 1 field dataset. Performance metrics update continuously as new paired samples are added. The discrepancy log (Appendix C) will accumulate additional entries during the separate Phase 2 permanent installation project as ongoing cross-checks are conducted; the protocol itself is fully operational and exercised.
The project developer should explore the applicability of SDWS Parameters 23 (volume of safe water treatment) and 27 (operational days) to the dMRV solution and provide a rationale for inclusion or exclusion of these parameters in the monitoring plan.
SDWS Parameter 23 (volume of safe water treatment) and SDWS Parameter 27 (operational days) are the two SDWS parameters most amenable to digital substitution by an in-line Lume sensor. The Lume's existing on-board channels — UVLED / SiPM / board temperatures and ToF turbidity — change predictably when the sensor's optical interface transitions between air-exposed, still water, and flowing water. Mapped to the methodology:
An end-to-end bench study built and validated a per-point flow-state classifier on Lume sensor #50051 across two distinct fixtures. The complete analysis — confusion matrices, per-class metrics, feature engineering, and reproducible snapshot data — is published at validation.thelume.ai/pipedflow/ (static snapshot 2026-04-27, 418 annotated segments, 1,599 classified data points). The underlying data is also available at piped-flow-test.pages.dev/analysis/.
/diagnostics (uvled_temperature, sipm_temperature, board_temperature) and /tof (signal_per_spad_kcps, distance_mm) from Lume v1.2 barcode #50051.| Test setup | Primary SDWS target | Overall accuracy | Cohen's κ | Key per-class result |
|---|---|---|---|---|
| Closed Pipe Flow (n=696 points) | SDWS 23 (volume) | 95.3% | 0.89 | Flowing recall 96.3%, Still recall 98.1% |
| Bucket Dispenser (n=903 points) | SDWS 27 (operational days) | 90.7% | 0.85 | Air recall 96.0%, Air precision 100.0% |
| Combined (n=1,599 points) | — | 93.0% | 0.85 | All three classes ≥ 85% recall |
For an integral-over-time deployment metric, this corresponds to ~7% time-budget error per measurement period across both setups combined: roughly 5 min of misclassified state per 100 min on the Closed Pipe Flow rig (relevant to SDWS 23) and roughly 9 min per 100 min on the Bucket rig (relevant to SDWS 27). Both are well within the precision needed for monthly carbon-credit verification cycles.
Feasibility: the Bucket Dispenser test directly demonstrates SDWS-27-grade air-vs-water discrimination. Air precision is 100% (every Air prediction was correct — zero false-positives) and Air recall is 96% (96 of every 100 actual air-exposed minutes are correctly labelled). For a binary daily question — "did this site have water for ≥ N minutes today?" — this exceeds the precision needed to meet Gold Standard's audit requirements. The 4% of missed Air minutes are biased toward conservatism (counting borderline air-exposed periods as Still under-counts air-exposure days, never over-counts).
Proposed deployment formula:
Feasibility: the Closed Pipe Flow test directly demonstrates SDWS-23-grade flowing-vs-still discrimination. Overall accuracy is 95.3% with Flowing precision 94.0% and Still precision 96.4%. The residual error is dominated by Flowing → Still under-counts (a directionally favorable bias for a conservative volume estimate; see below). The classifier is therefore sensor-side ready for SDWS 23 estimation, conditional on per-site flow-rate calibration.
Proposed deployment formula:
The dominant bottleneck across both tests is sample cadence. At the snapshot rate of one sensor reading per ~6 min, 15-min Flowing windows yield only 2–3 samples per event, leaving the temperature-derivative features statistically underpowered. The straightforward operational fix is to return the firmware to 1-min sample cadence (the configuration the original 2026-04-18 closed-pipe-flow study used), which would put 15+ samples in every Flowing event and is expected to lift Flowing recall on both setups well above 95%. This will be implemented during the separate Phase 2 permanent installation project.
US bench evidence demonstrates classifier accuracy that meets the precision needed for both parameters, and the deployment formulas (above) reduce each to an aggregation of well-characterised per-point predictions. Final inclusion is conditional on Rwanda field-validation work to be conducted under the separate Phase 2 project.
The FAR 4 requirement asked the project developer to “explore the applicability” of SDWS 23 and 27 and “provide a rationale for inclusion or exclusion.” Both elements are addressed:
| Required element | Where documented | Status |
|---|---|---|
| Explore applicability of SDWS 23 | §4.5 — Closed Pipe Flow test (95.3% accuracy, κ=0.89, 696 points); deployment formula defined; conservative bias documented | Complete |
| Explore applicability of SDWS 27 | §4.4 — Bucket Dispenser test (90.7% accuracy, κ=0.85, 903 points); Air precision 100%; deployment formula defined | Complete |
| Combined classifier validation | §4.3 — 1,599 points across both setups, 93% overall accuracy, κ=0.85, ~7% time-budget error | Complete |
| Rationale for inclusion/exclusion | §4.7 — both parameters recommended for inclusion; bench accuracy meets the precision needed for monthly verification cycles | Complete |
| Published evidence | validation.thelume.ai/pipedflow/ — full confusion matrices, per-class metrics, feature engineering, snapshot data (418 segments, 1,599 points) | Complete |
The exploration is complete. The Lume sensor’s existing on-board channels (temperature dynamics across three thermistors + ToF turbidity) enable three-class flow-state classification at 93% accuracy on 1,599 bench data points, with per-parameter accuracy of 95.3% (SDWS 23) and 90.7% (SDWS 27). Both parameters are recommended for formal inclusion in the monitoring plan. Field deployment and site-level calibration at Rwandan institutional sites will be conducted under the separate Phase 2 permanent installation project, which will generate the operational data needed to finalise per-site flow-rate calibrations and validate the deployment formulas against in-person attendance logs and manual fill records.
Phase 1 was deliberately conducted across two independent water programs in two countries to test whether the Lume sensor and its estimation model generalise beyond a single operating context. Rwanda and Kenya differ in climate, altitude, water infrastructure, source-water chemistry, and institutional setting. A model that performs consistently across both provides stronger evidence for substitution than one validated in a single program. The full dataset, methodology, model specification, and live results are published at validation.thelume.ai/cbt.
| Rwanda — Amazi Meza | Kenya — DRIP FUNDI | |
|---|---|---|
| Program | School-based water treatment serving ~600,000 students across 500+ schools (scaling to 1.5M by 2028). Gold Standard GS12240 — 33,911 tCO2e issued to date. | USAID-funded drought resilience platform serving ~120,000 people across 200 boreholes in five northern Kenya counties. Sensor-based predictive maintenance raised borehole uptime from 56% to 91%. |
| Setting | Highland institutional sites (schools), ~1,600 m elevation, Kamonyi and Kicukiro districts | Arid/semi-arid community sites, ~500–900 m elevation, Isiolo and Turkana counties |
| Water sources | Spring water, stream/surface water, rainwater harvesting, piped municipal supply | Boreholes, water kiosks, public stand taps, inline chlorination systems (Aquatab) |
| Treatment | LifeStraw Community gravity ceramic filters | Inline chlorination (Aquatab), some untreated distribution points |
| Observations | 84 (48%) | 92 (52%) |
| Sensors | 50045, 50065 | 50053, 50065 |
| Sites | 32 sampling points — EP Nyakabungo, EP Nyakabuye, EP Rwishwima (schools), plus diverse source-water test sites in Kicukiro/Kamonyi | 20 sampling points — Garbatula and Ngaremara (Isiolo), Loima/Turkwel, Lokichar/Kimabur, Kakuma/Nakoyo (Turkana) |
| Water temp | 25.0–41.7°C | 25.9–42.6°C |
Sensor 50065 was deployed in both countries, providing a direct within-sensor comparison across programs. Its 89% LOOCV agreement (108 paired points) demonstrates that a single physical sensor generalises across the Rwanda and Kenya operating contexts without recalibration.
The Phase 1 dataset comprises 176 paired Lume–CBT observations from 52 distinct sampling points, collected May 25 – June 11, 2026. Each observation pairs a CBT grab sample with the nearest sensor reading within a ±20-minute window (6,711 total sensor readings back the 176 pairs, avg 38 per window).
| Category | Count | Percentage |
|---|---|---|
| By country | ||
| Rwanda | 84 | 48% |
| Kenya | 92 | 52% |
| By treatment status | ||
| Treated (post-filtration/chlorination) | 89 | 51% |
| Source (untreated) | 87 | 49% |
| By contamination level (CBT) | ||
| 0 CFU/100 mL (conformity) | 101 | 57% |
| 1–10 CFU/100 mL (low/intermediate risk) | 27 | 15% |
| >10 CFU/100 mL (high risk / unsafe) | 48 | 27% |
This composition is representative of WASH drinking-water monitoring: the majority of treated samples are clean (as expected from functioning treatment systems), while source-water samples span the full contamination range. The near-equal split between treated and untreated, and between Rwanda and Kenya, means the model is trained on a genuine cross-section of the water supplies it will monitor in production.
The estimation model is a right-censored linear regression (Tobit model) with 5 coefficients (Appendix B). For each paired observation, the model takes three sensor inputs — fluorescence signal (mon2, temperature-corrected and normalised), time-of-flight turbidity (normalised), and water temperature — and produces a log10(CFU+1) estimate. Right-censoring at the CBT detection limit ensures the model does not hallucinate precision below the reference method’s resolution.
Model validation uses leave-one-out cross-validation (LOOCV): for each of the 176 observations, the model is retrained on the remaining 175 and predicts the held-out point. This is the most conservative cross-validation scheme — every single observation is tested against a model that has never seen it. Agreement is defined as prediction within ±1 log10(CFU+1) of the CBT result, the established inter-method precision for microbial water testing.
In addition to continuous estimation, the model is evaluated as a binary classifier at the ≥10 CFU/100 mL contamination threshold (the WHO “intermediate risk” boundary most relevant for WASH compliance). Balanced accuracy, sensitivity, and specificity are computed to assess detection performance independent of class prevalence.
| Sensor | Country | Paired points | LOOCV agreement |
|---|---|---|---|
| 50045 | Rwanda | 19 | 89% (17/19) |
| 50053 | Kenya | 49 | 86% (42/49) |
| 50065 | Both | 108 | 89% (96/108) |
| All sensors | 176 | 88% (155/176) | |
| Metric | Value | Interpretation |
|---|---|---|
| Balanced accuracy | 88% | Average of sensitivity and specificity, unaffected by class imbalance |
| Sensitivity | 88% | Probability of correctly detecting contaminated water |
| Specificity | 88% | Probability of correctly classifying safe water |
| AUC | 0.92 | Area under the ROC curve — discrimination ability across all thresholds |
This balanced accuracy reaches 95% of the ~92.5% ceiling imposed by the CBT reference method’s own inter-method variability (CBT vs. membrane filtration agreement is ~92–93% at the same threshold). The sensor is approaching the limit of what any single method can achieve against any other single method.
| Classification task | Agreement | Notes |
|---|---|---|
| Three-tier WHO risk (<10, 10–99, ≥100 CFU) | 95% within ±1 category | Only 5% of predictions are >1 WHO risk tier away from CBT |
| ≥10 CFU binary (contamination screening) | 88% balanced accuracy | The primary WASH compliance threshold |
| ≥1 CFU binary (presence/absence) | 76% balanced accuracy | Cannot reliably distinguish 0 from 1–9 CFU; not recommended for zero-certification |
Among Kenya DRIP samples with free chlorine residual measured (n=66), 100% of chlorinated samples (Cl2 > 0, n=30) had 0 CFU by CBT and were correctly classified as safe by the Lume. This confirms the sensor can verify treatment system efficacy in chlorinated supplies.
US-side validation established the Lume's intrinsic measurement performance using laboratory Colilert (n=209, R²=0.881) and Membrane Filtration (n=303, R²=0.872) reference methods across multiple sites (Boulder Creek CO, Seine River FR, Yampa River CO). The US data also provides the inter-method variability baseline (κ=0.88 Lume↔Colilert vs. κ=0.40 Colilert↔MF on n=153 three-way split samples) that anchors the substitution case. Live US deployments include three Boulder Creek sensors streaming continuously to boulder-water.pages.dev.
Live, continuously updated Phase 1 results: validation.thelume.ai/cbt
Phase 2 will deploy 30–50 permanent Lume sensors at Amazi Meza institutional sites in Rwanda. It will be conducted as a separate project with its own work plan, timeline, and reporting. Phase 2 will generate operational data for SDWS 23/27 field deployment (site-level flow-rate calibrations and deployment formula validation) and ongoing CBT cross-check data. This report covers Phase 1 mobile validation only; all four FARs are resolved based on Phase 1 evidence.
| Metric | Aggregation | Target | Result |
|---|---|---|---|
| Sensor uptime | % of expected check-ins received | ≥95% | TBD |
| Calibrated-combo coverage | % of readings at led=512, bias∈[2960,3040] | ≥90% | TBD |
| Battery longevity | Median V/week drift | <0.05 V/week | TBD |
| CBT cross-check rate | Paired samples per site per quarter | ≥3 | TBD |
| Discrepancy rate | % of CBT pairs >1 WHO band off | ≤15% (matched to Colilert↔MF baseline) | TBD |
| Operational-day coverage (SDWS 27) | Days/site with ≥120 min in-water flag | ≥28 / 30 days | TBD |
Based on the evidence presented in this report, Virridy recommends that the Lume sensor be approved as a digital substitute for periodic CBT sampling for SDWS Parameter 18 (Microbial Drinking Water Quality) under Gold Standard Pilot 14. The evidence base comprises:
CBTs are retained as periodic cross-checks to validate ongoing sensor accuracy, not as the primary monitoring instrument. All four Forward Action Requirements are resolved. The complete evidence base is continuously available at validation.thelume.ai/cbt.
Authoritative source: live validation page at validation.thelume.ai/cbt.
| Component | Parameter | Value |
|---|---|---|
| Tobit regression: log10(CFU+1) | intercept | 1.386 |
| z(mon2c_n) | 0.865 | |
| z(tof_n) | 0.393 | |
| FE·50053 | −0.771 | |
| FE·50065 | −0.619 | |
| Tobit σ̂ | — | 0.667 log10 |
| Right-censoring point | — | log10(101) ≈ 2.004 |
| Temperature correction | pooled ρ | 0.0139 (R² = 0.946, n = 101 clean-water samples) |
| Preprocessing | mon2c formula | mon2 × exp(−ρ × (T − 20)) |
| Baseline normalization | mon2c_n | mon2c − per-sensor clean-water median |
| (continued) | tof_n | tof − per-sensor clean-water median |
| Model | Coefficient | Value |
|---|---|---|
| Turbidity (NTU) regression — bench, sensor 50031 | intercept (absolute, single-sensor) | −145.89 |
| slope (transfers across sensors) | 2.0488 |
Note: The absolute NTU intercept is sensor-specific (calibrated on bench unit 50031). Field deployments use a sensor-relative anomaly form: NTU = max(0, 2.05 × (sps − per-sensor-baseline)), where the baseline is the rolling 10th-percentile of in-water sps for that unit.
Rolling table of all (Lume, CBT) pairs flagged as discrepant under the FAR 3 protocol (>1 WHO risk band disagreement), with resolution status.
| Date | Sensor | Site | CBT result | Lume prediction | Discrepancy | Resolution |
|---|---|---|---|---|---|---|
| 2026-06-03 | 50053 | Kenya / DRIP | ≥100 CFU (Very High) | Below baseline (sensor anomaly) | Excluded from dataset | Sensor was below clean-water baseline during high-contamination CBT; observation excluded as pairing error per data pipeline QA (see CBT page exclusion table) |
No other pairs from the 176-observation Phase 1 dataset have been flagged as discrepant under the >1 WHO risk band threshold. Additional entries will be added during the separate Phase 2 project as ongoing cross-check pairs are collected.
| Barcode | Program | Deployment sites | Active since | Paired CBT samples | Status | Replacement events |
|---|---|---|---|---|---|---|
| 50045 | Rwanda / Amazi Meza | EP Nyakabungo, EP Nyakabuye, EP Rwishwima, Kicukiro, Kamonyi | May 2026 | 19 | Active | None |
| 50053 | Kenya / DRIP | Isiolo (Garbatula, Ngaremara), Turkana (Loima, Turkana South, Turkana West) | May 2026 | 49 | Active | None |
| 50065 | Both programs | Rwanda + Kenya sites (rotated across both programs) | May 2026 | 108 | Active | None |
All three sensors operate at calibration point led_power=512, sipm_bias ∈ [2960, 3040]. No sensor replacements have been required during Phase 1. Chain of custody is maintained via Blues Notecard check-in telemetry with per-device cryptographic signatures.
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 (Draft) | 2026-04-27 | Virridy | Initial draft. US-pilot evidence populated for FARs 1, 2, 4.3; Rwanda / Phase-2 sections marked TBD. |
| 0.2 | 2026-06-14 | Virridy | Phase 1 complete. All four FARs resolved. Executive summary rewritten with substitution case. CBT field validation results (n=176 paired samples, 88% LOOCV). SDWS 23/27 exploration complete (1,599 bench data points, 93% accuracy). Phase 2 scoped as separate project. |