Gold Standard Virridy v0.2 — Phase 1 Complete

Pilot Report &
FAR Resolution Evidence

Pilot 14 — dMRV Solution for E. coli estimation under the Gold Standard SDWS methodology

Pilot Approval: 03.09.2025 This Report: 2026-06-14 Report Version: 0.2

Executive Summary

This report presents Virridy's evidence that the Lume sensor can replace periodic Compartment Bag Test (CBT) sampling as the primary water-quality monitoring method under the Safe Drinking Water Supply (SDWS) methodology — specifically for Parameter 18 (Microbial Drinking Water Quality). Where a CBT gives a single snapshot per site visit, a permanently installed Lume sensor provides continuous, autonomous E. coli estimation at every check-in interval, generating orders of magnitude more data at lower marginal cost with no enumerator, no incubation, and no manual data entry.

The core evidence for substitution is straightforward: the Lume sensor agrees with CBT at least as well as two accepted laboratory methods agree with each other. On 153 three-way split samples (Lume, Colilert, Membrane Filtration), the Lume↔Colilert agreement was κ=0.88 (“almost perfect”) while the Colilert↔MF agreement was only κ=0.40 (“fair”). In the field deployment, the CBT-trained Tobit model achieves 88% leave-one-out cross-validation agreement, 85% balanced accuracy (AUC=0.892) at the ≥10 CFU/100 mL contamination threshold, and 97% within-one-category WHO risk-tier agreement — validated on 216 paired Lume–CBT field samples from 3 sensors across 52 sampling points in two countries. The entire model is 8 published coefficients (Appendix B) that any third party can verify with a calculator. The complete dataset, methodology, and live results are published at validation.thelume.ai/cbt.

The field dataset spans two independent water programs in two countries: Amazi Meza in Rwanda (school-based water treatment, ~600,000 students, Gold Standard GS12240) and DRIP FUNDI in Kenya (USAID drought resilience, ~120,000 people, 200 boreholes). The programs differ in climate (highland vs. arid), water sources (springs and rainwater vs. boreholes and kiosks), treatment technology (ceramic filtration vs. chlorination), and institutional setting (schools vs. community water points). The model achieves 85–98% per-sensor agreement across both contexts without country-specific tuning. This cross-sectional validation provides confidence that the sensor generalises across the range of water systems it will monitor in production.

The substitution does not eliminate manual sampling — it changes its role. CBTs shift from the primary monitoring method to a periodic cross-check that validates sensor accuracy on an ongoing basis. The integration protocol (§3.1) defines when cross-checks occur, how sensor and CBT data are paired, and how discrepancies are detected and resolved. This protocol is operational and has processed the full Phase 1 dataset.

This report addresses the four Forward Action Requirements (FARs) raised at Gold Standard Pilot 14 approval (3 September 2025), incorporating Phase 1 field validation data from Rwanda (Amazi Meza) and Kenya (DRIP FUNDI), alongside US-based validation studies (Boulder Creek CO, Seine River FR, Yampa River CO).

Model architecture — explicit choice for verifiability

For this dMRV implementation, Virridy has elected to deploy only transparent regression models for E. coli estimation, specifically a right-censored Tobit regression on eight parameters (baseline-subtracted raw fluorescence, water temperature, a turbidity proxy, plus a per-sensor 2-point calibration: a per-sensor intercept offset and a per-sensor fluorescence slope, i.e. offset + gain), with binary risk classification obtained by applying Youden-optimal thresholds to that same model's predictions. No AI, machine-learning, or gradient-boosted ensemble model is used in the deployed verification pipeline. The exhaustive published coefficients in Appendix B are the entire model. While the academic literature explores ML approaches and Virridy's earlier research and patents include adaptive-learning techniques, the choice for verifier-facing operation is a fully auditable closed-form regression that any third party can reproduce with a calculator.

FAR resolution status (current snapshot)

FAR	Requirement	Status	Primary Evidence
FAR 1	Sensor Validation & Calibration Protocols	Resolved	Calibration protocol documented (§1.3); drift monitoring operational (§1.4); 6,855 sensor observations across 216 paired Lume–CBT field samples from 3 sensors in 2 countries — 88% LOOCV, per-sensor 85–98%; US lab baselines: n=209 Colilert (R²=0.881), n=303 MF (R²=0.872)
FAR 2	AI/ML Implementation & Validation	Resolved	Resolved by design: deployed model is a CBT-trained Tobit regression (no AI/ML) — 8 published coefficients (Appendix B), training data provenance (§2.2), cross-validation (§2.3: 88% LOOCV, 85% balanced accuracy), retraining & version-control procedures (§2.4). Every element of the original requirement is documented; the model exceeds the transparency standard since the entire pipeline is reproducible by hand.
FAR 3	Manual ↔ Digital Integration Protocol	Resolved	Protocol established and exercised: 216 paired Lume–CBT samples processed through automated pairing, exclusion, and discrepancy detection pipeline at validation.thelume.ai/cbt; protocol documented in §3.1
FAR 4	SDWS 23 & 27 Exploration	Resolved	Exploration complete: flow-state classifier validated on 1,599 bench data points across two test setups (Closed Pipe Flow 95.3% / Bucket 90.7%, κ≥0.85); both parameters recommended for inclusion; full analysis at validation.thelume.ai/pipedflow/. Field deployment deferred to separate Phase 2 project.

FAR 1 is resolved. The original requirement called for detailed protocols covering calibration check frequency, drift thresholds, and sensor replacement procedures. All three are documented (§1.3, §1.4) and operational. The field validation dataset comprises 6,855 individual sensor observations across 216 paired Lume–CBT samples from 3 sensors deployed in Rwanda and Kenya (May–June 2026). Each paired sample is matched to the lowest-fluorescence (fully-submerged) sensor reading within a ±10-minute window and draws on an average of ~31 sensor readings within that window, all at the calibrated operating point. The Implementation Plan estimated 250–350 paired samples as the minimum to reach target performance; that performance level — 88% LOOCV agreement, per-sensor 85–98% — was achieved with 216 pairs. The validation objective is met. FAR 2 is resolved. The requirement called for full model documentation; the deployed Tobit regression is fully specified by 8 published coefficients with complete training-data provenance, cross-validation results, and version-control procedures (§2.1–2.4). FAR 3 is resolved. The requirement called for a clear protocol integrating manual water quality sampling with digital Lume sensor data; the operational pipeline at validation.thelume.ai/cbt defines when manual sampling occurs, automates pairing, and specifies discrepancy detection and resolution (§3.1–3.3). The protocol has processed 216 paired observations from 3 sensors in 2 countries. FAR 4 is resolved. The requirement asked Virridy to “explore the applicability” of SDWS 23 and 27 and provide a rationale for inclusion or exclusion. The exploration is complete: a flow-state classifier validated on 1,599 bench data points across two test setups achieves 93% combined accuracy (κ=0.85), and both parameters are recommended for inclusion (§4.1–4.8). Full evidence at validation.thelume.ai/pipedflow/. Field deployment and site-level calibration will be conducted under the separate Phase 2 project.

All four FARs resolved

This report covers Phase 1 mobile validation, which is complete with 216 paired Lume–CBT field samples from Rwanda and Kenya. All four Forward Action Requirements raised at pilot approval are resolved. Phase 2 permanent installation will be conducted as a separate project and validation effort with its own work plan, timeline, and reporting; Virridy is requesting that this long-term validation be sited on the DRIP FUNDI piped systems in Kenya (see requests above).

What Virridy is requesting from Gold Standard

Four requests run through this report, in order of immediacy:

Accept the Phase 1 validation for the Amazi Meza (Rwanda) pilot. Confirm that the Phase 1 mobile validation resolves the four FARs and qualifies the Lume as the digital monitoring instrument for Amazi Meza (GS12240), the project for which Pilot 14 was approved.
Extend pilot approval to the Kenya DRIP FUNDI programme for mobile use. The Phase 1 dataset was collected in both countries (216 paired Lume–CBT samples spanning Rwanda and Kenya, with program-level agreement of 93% in Kenya). On that basis Virridy requests that the Pilot 14 scope be formally extended to recognise the Lume for mobile validation use in the Kenya DRIP FUNDI programme, not Rwanda alone.
Approve Phase 2 (permanent installation) validation on the DRIP Kenya piped systems. Because the DRIP systems are piped, they are the appropriate setting to validate the long-term permanent-installation configuration and the SDWS 23 (volume) and SDWS 27 (operational days) flow parameters, which require flow measurement on piped infrastructure. Virridy requests permission to conduct the Phase 2 long-term validation as a separate project, with permanent Lume installations at a minimum of 20 DRIP FUNDI sites in Kenya.
Advance the Lume toward full, instrument-level recognition. Stated up front so it is not a surprise later: Virridy intends to pursue recognition of the Lume as a Validated Sensor (Digital MRV) under SDWS v2.0 for use across SDWS projects within the validated water-type envelope, through the dMRV Programme’s formal methodology-revision procedure (dMRV Requirements §3.1–§3.2). The instrument-level case is set out in the Methodology Integration and Conclusion sections below.

Pilot Implementation Status

Pilot Approval Date3 September 2025

Report Cut-off14 June 2026

Time Since Approval~9 months

MethodologyApproved under SDWS v1.0; aligned to SDWS v2.0 (M400) for crediting. Parameter 18 (+ exploratory 23, 27)

Host ProgrammeAmazi Meza (Rwanda)

Pilot Sensor ModelLume v1.2 (TLF + ToF + temperature)

Six-Month Status UpdateFiled with Gold Standard 19 March 2026

What has been deployed

The approved Implementation Plan describes two phases. This report covers Phase 1.

Phase 1 — Mobile Lume Validation (this report): US baseline validation studies (complete); Rwanda + Kenya field validation (complete), 216 paired Lume–CBT samples collected from 3 sensors across 7+ sites in 2 countries.
Phase 2 — Permanent Installation (separate project): 30–50 permanent Lume sensors, which Virridy requests to deploy on the DRIP FUNDI piped systems in Kenya (piped infrastructure suited to SDWS 23/27 flow validation). Phase 2 will be conducted as a separate project and validation effort with its own work plan, timeline, and reporting.

Deviations from the approved plan

Model architecture change: The approved Implementation Plan described the deployed E. coli estimation model as a gradient-boosted decision tree ensemble. Virridy has instead deployed a transparent right-censored Tobit regression with 8 published coefficients. This architectural change was made in April 2026 to maximise auditability and reproducibility for the dMRV verification pipeline. Gold Standard was informally notified during the pilot process; formal notification is pending. No other scope, schedule, or methodology deviations have been submitted.

Sensor and site inventory snapshot

Cohort	Sensors deployed	Sites	Active since	Data points (cumulative)
US validation (Boulder Creek)	3	BC-CU, BC-55, BC-Can	April 2026	~50,000+ continuous readings
US bench / lab (multi-sensor)	10+	Lab fixture, Yampa, Seine	2022 → present	n=512 paired lab samples (combined Colilert + MF)
Rwanda Amazi Meza — Phase 1	3 (50045, 50053, 50065)	EP Nyakabungo, EP Nyakabuye, EP Rwishwima, Kicukiro, Kamonyi (RW); Isiolo, Turkana (KE)	May 2026	216 paired Lume–CBT points
Phase 2 — Permanent Installation	Separate project and validation effort. See §Phase 2.

FAR 1 — Sensor Validation & Calibration Resolved

Original requirement (Pilot 14 approval, Sep 2025)

The project developer must provide detailed protocols for sensor validation and calibration, including frequency of calibration checks, acceptable drift thresholds, and procedures for replacing or recalibrating sensors that fall outside tolerance.

1.1 Paired sensor–reference comparison evidence

The Lume sensor's measurement performance against laboratory reference methods has been characterised across multiple independent studies. The two relevant gold-standard methods are Colilert (IDEXX defined-substrate technology) and membrane filtration (MF, US EPA Method 1604). Full source: thelume.ai/research.

Reference method	Paired n	R²	Binary accuracy at 10 CFU/100 mL	Cohen's κ	Source
Colilert (IDEXX)	209	0.881	0.92 (balanced 0.92)	0.84	Knopp et al. (2026); thelume.ai/research
Membrane Filtration (EPA 1604)	303	0.872	—	—	MF-trained model, method-agnostic validation
Three-way (Lume / Colilert / MF)	153	—	Lume κ=0.88 vs. Colilert	Colilert↔MF κ=0.40	Method-comparison subset

Notable: the Lume↔Colilert agreement (κ=0.88, almost perfect) is substantially stronger than the Colilert↔MF agreement (κ=0.40, fair) on the same n=153 split-sample set. This indicates the Lume's reproducibility against either reference method is on the order of, or better than, the inherent reproducibility between two accepted laboratory methods.

1.2 Performance by WHO risk category

WHO-defined drinking-water risk bands (Low <1, Intermediate 1–10, High 10–100, Very High >100 CFU/100 mL):

Risk band split	Threshold	Overall accuracy	Balanced accuracy	Cohen's κ
Safe vs. any contamination	1 CFU/100 mL	0.91	0.91	0.82
WHO Low/Intermediate vs. High+	10 CFU/100 mL	0.92	0.92	0.84
3-category (<10, 10–100, >100)	multi	0.91	0.85	0.60
Recreational binary (Seine R., held-out)	900 CFU/100 mL	0.968	0.94	—

1.3 Calibration protocol

The Lume sensor is calibrated at the operating point led_power = 512, sipm_bias ∈ [2960, 3040] (target 3000) — these are the parameters under which the deployed CFU regression and turbidity (NTU) regression were trained. Sensors falling outside this window automatically fall back to the (LED, bias) combo nearest the target via the Lume backend; readings from the fallback combo are flagged as “provisional” on operational dashboards.

Calibration check	Frequency	Tolerance / Pass criterion	Action on failure
Operating-point combo (LED 512, bias ~3000)	Continuous (every reading)	bias ∈ [2960, 3040]	Fall back to nearest combo; dashboard flags “Provisional”; replace sensor if fallback persists > 7 days
Turbidity (ToF) zero-baseline check	Continuous (per-sensor 10th-%ile in-water)	Sensor-relative anomaly: `NTU = max(0, 2.05 × (sps − baseline))`	Re-baseline automatically from rolling in-water minimum
Field paired CBT or Colilert grab-sample	Per institutional visit during Phase 1; at least quarterly during Phase 2	Within 1 WHO risk band of Lume estimate	Investigate; flag period; re-train if systematic
Sensor swap / retirement	On detection of persistent fallback, low battery (<3.8 V steady), or repeated air-exposed flag	—	Replace in field; data continuity preserved via Blues check-in chain of custody

1.4 Drift monitoring

The Lume Fleet Health dashboard (internal operations tool, requires login) tracks each sensor's:

Battery drift (V/week) — trend tag fires above 0.05 V/week loss; alert above 0.15 V/week.
Data gap — flags any sensor with no Pumphaus telemetry for >6 hours despite Blues check-in.
GPS drift — flags >100 m from expected location (warn) and >500 m (alert).
Calibrated-combo coverage — flags when the firmware bias-sweep skips the calibrated window.

1.5 CBT field validation evidence (Phase 1, May–June 2026)

216 paired Lume–CBT field samples from 3 sensors deployed across Rwanda (Amazi Meza) and Kenya (DRIP). Each grab is matched to the lowest-fluorescence (mon2) reading within a ±10-minute window — the fully-submerged, physically-correct value, since briefly lifting the sensor out of the water to sample spikes its apparent fluorescence. Each paired sample is backed by an average of ~31 individual sensor readings within that window (6,855 total sensor observations, all at the calibrated operating point led=512, bias ∈ [2960, 3040]). The CBT-trained Tobit regression achieves:

Metric	Value
Total sensor observations	6,855 (across 216 paired CBT samples)
LOOCV agreement (±0.92 log₁₀)	88% (191/216)
Balanced accuracy (≥10 CFU) — deployed Tobit, thresholded	85% (sensitivity 76%, specificity 93%, AUC 0.892)
Balanced accuracy (≥10 CFU) — class-balanced logistic	83% (sensitivity 84%, specificity 82%, AUC 0.885)
Per-sensor: 50045 (Rwanda)	85% (41/48)
Per-sensor: 50053 (Kenya)	98% (51/52)
Per-sensor: 50065 (both)	85% (99/116)

Live, continuously updated results: validation.thelume.ai/cbt

1.6 Resolution summary

The FAR 1 requirement asked for “detailed protocols for sensor validation and calibration, including frequency of calibration checks, acceptable drift thresholds, and procedures for replacing or recalibrating sensors that fall outside tolerance.” Each element is addressed:

Calibration check frequency: continuous per-reading operating-point verification (§1.3, row 1); quarterly paired CBT grab-samples planned for the separate Phase 2 validation effort (§1.3, row 3).
Drift thresholds: battery >0.05 V/week, data gap >6 h, GPS >100 m, calibrated-combo coverage (§1.4).
Replacement procedures: persistent fallback >7 days, low battery <3.8 V, or repeated air-exposed flag triggers field swap with chain-of-custody preserved via Blues check-in (§1.3, row 4).
Field validation: 6,855 sensor observations across 216 paired Lume–CBT samples from Rwanda and Kenya confirm the sensor meets accuracy requirements across deployment water types (88% LOOCV agreement, per-sensor 85–98%, balanced accuracy 85% at ≥10 CFU). The Implementation Plan estimated 250–350 paired samples as the minimum to achieve target performance; that performance was reached with 216 pairs. The validation objective is met.

FAR 2 — Model Documentation (Linear Regression, no AI/ML) Resolved

Original requirement (Pilot 14 approval, Sep 2025)

Full documentation of the AI/ML model used for E. coli estimation must be provided, including training data sources, model architecture, validation results, accuracy metrics, and procedures for model retraining and version control.

Resolution by architectural choice

The original FAR was written under the assumption that an AI/ML model would be deployed. Virridy has elected not to deploy an AI/ML model for verification. Instead, the deployed pipeline uses a right-censored Tobit regression in which water temperature enters as an explicit predictor (not a pre-correction), fully specified by 8 published coefficients and reproducible by hand. There is no opaque model state, no black-box inference, no online learning, and no need for AI-specific governance such as adversarial testing or fairness auditing. The move from gradient-boosted decision trees (described in the approved Implementation Plan) to transparent regression was an intentional architectural choice for verifiability and auditability.

2.1 Model card

Attribute	Value
Model family	Right-censored Tobit regression (ridge-regularized, λ = 0.1). Water temperature is an explicit model predictor, not a pre-correction. No AI, no ML, no decision trees, no ensemble methods, no neural networks in the deployed pipeline.
Output (primary)	E. coli concentration (CFU/100 mL) via log₁₀(CFU+1) prediction
Output (secondary)	Categorical risk class (WHO Low / Intermediate / High / Very High)
Input features (7 predictors + intercept)	z-scored baseline-subtracted raw fluorescence (`mon2`), z-scored water temperature, z-scored turbidity proxy (`ToF`), and a per-sensor 2-point calibration (offset + gain): per-sensor fixed-effect intercepts (50045, 50053; reference 50065) and per-sensor fluorescence slopes (50045, 50053).
Pre-processing	Per-sensor clean-water baseline subtraction for `mon2`, then z-score standardization of `mon2`, temperature, and `ToF`; ridge regularization λ = 0.1. No temperature pre-correction (no ρ term).
Coefficients (8 total)	[0.828, 1.976, 0.173, 0.006, 0.453, −0.589, −0.191, −1.557] — intercept, mon2, temp, tof, FE·50045, FE·50053, slope·50045, slope·50053
σ̂ (Tobit)	0.606 log₁₀
Right-censoring point	CBT detection limit at 100 CFU/100 mL (log₁₀(101) ≈ 2.004)
Training data	216 paired Lume–CBT field samples, 3 sensors, Rwanda + Kenya, May–June 2026

2.2 Training data provenance

Dataset	n	Reference method	Locations	Use
Lume v1.2 multi-site validation	~512 paired (combined Colilert + MF)	Colilert / MF	US (Colorado: Boulder Creek, Yampa); France (Seine); historical Kenya, Malawi	Primary regression training + cross-validation
Bedell et al. (2022) Water Research	Published	Culture-based	Kenya groundwater (37 sites, Sorensen et al. 2018 cohort)	Foundational TLF↔E. coli relationship; 83% reported accuracy
Knopp et al. (2026) EarthArXiv	Published	Colilert + MF	Multi-site (US + France)	Lume v1.2 sensor design + multi-site validation results
Nowicki et al. (2020)	Published	Culture	Malawi	TLF reproducibility (14% RPD vs. ≥26% for culture)

2.3 Cross-validation results (CBT Tobit model)

CV scheme	Agreement	Balanced accuracy	Notes
LOOCV (full dataset, n=216)	88% within ±0.92 log₁₀	—	Each point predicted by a model refit on the remaining 215 using the fixed 8-parameter specification (LOOCV R²=0.469, MAE=0.415 log₁₀)
Binary ≥10 CFU/100 mL — deployed Tobit, thresholded	—	85% (sens=76%, spec=93%, AUC=0.892)	Contamination detection threshold
Binary ≥10 CFU/100 mL — class-balanced logistic	—	83% (sens=84%, spec=82%, AUC=0.885)	Classifier optimized directly for the binary decision
Binary ≥1 CFU/100 mL — deployed Tobit, thresholded	—	71% (sens=50%, spec=93%, AUC=0.761)	Presence/absence threshold
Binary ≥1 CFU/100 mL — class-balanced logistic	—	73% (sens=62%, spec=84%, AUC=0.771)	Presence/absence threshold
Per-sensor: 50045	85% (41/48)	—	Rwanda (Amazi Meza)
Per-sensor: 50053	98% (51/52)	—	Kenya (DRIP)
Per-sensor: 50065	85% (99/116)	—	Both programs

2.4 Procedures for retraining and version control

Single source of truth. The deployed regression coefficients live in functions/js/ecoli-model.js.js in the Virridy code repository (model version 2026-04-27-turbidity-relative). The same coefficients are mirrored in the offline Lume desktop dashboard (src/model/e_coli.rs ACTIVE_MODEL); both copies must move together.
Versioning. Each model release carries a date-stamped MODEL_VERSION string. The shared model file is served with Cache-Control: no-store so every dashboard fetches the latest on every page load — no per-page cache busting required.
Retraining trigger. Re-fit is performed when (a) ≥100 new paired samples are accumulated from a new geography or water type, (b) systematic residual bias is detected in any verification audit, or (c) a sensor hardware revision changes optical or thermal characteristics.
Audit trail. All training notebooks, paired-sample CSVs, and fitted coefficients are committed to the version-controlled SweetSenseInc/lume_desktop_dashboard repository on the pc-sandbox branch.

2.5 Model adaptation status

The CBT-trained Tobit model was developed directly on Rwanda + Kenya field data (216 paired Lume–CBT samples from 3 sensors). It generalises across both programs with per-sensor agreement of 85–98%. The model card above and Appendix B reflect the deployed CBT model coefficients. Live validation at validation.thelume.ai/cbt updates continuously as new paired samples are added. The locked model version applies only within a verification window; between windows the model is re-fit and re-versioned under the §2.4 retraining triggers as the integrated CBT field dataset grows, so accuracy improves over time without compromising within-window reproducibility. The model and coefficients are therefore not permanently fixed: “one version in production” means one authoritative version at a time, not a frozen model.

2.6 Resolution summary

The FAR 2 requirement asked for “full documentation of the AI/ML model used for E. coli estimation, including training data sources, model architecture, validation results, accuracy metrics, and procedures for model retraining and version control.” Every element is addressed — and the architectural choice to deploy a transparent linear regression rather than an AI/ML model means the documentation standard is exceeded, not merely met:

Required element	Where documented	Status
Training data sources	§2.2 — four provenance datasets, peer-reviewed publications	Complete
Model architecture	§2.1 — Tobit regression, 8 coefficients, no AI/ML. Architecture change from GBDT documented in Deviations section.	Complete
Validation results	§2.3 — LOOCV, per-sensor breakdowns, binary classifiers at multiple thresholds	Complete
Accuracy metrics	§2.3 — 88% LOOCV, 85% balanced accuracy, per-sensor 85–98%	Complete
Retraining procedures	§2.4 — trigger criteria (≥100 new samples from new geography, systematic residual bias, hardware revision)	Complete
Version control	§2.4 — date-stamped `MODEL_VERSION`, git-tracked coefficients, `Cache-Control: no-store` serving	Complete

The original FAR assumed an opaque AI/ML model would be deployed, requiring governance measures such as adversarial testing and fairness auditing. By electing to deploy a transparent Tobit regression — where the entire model is 8 published coefficients reproducible with a calculator — Virridy has rendered these concerns inapplicable. Any third party can independently verify the model’s output from raw sensor readings using only the coefficients in Appendix B. Gold Standard has been informally notified of the architecture change; formal notification is an administrative follow-up and does not affect the completeness of the technical documentation.

2.7 Independent analysis against laboratory-enumerated samples

The pilot approval and FAR 2 condition the deviation on validating the Lume against laboratory-enumerated samples, with an independent analysis of its performance. Both elements are satisfied:

Enumeration-based reference methods. The Lume is validated against reference methods that enumerate E. coli: IDEXX Colilert (Most Probable Number) and membrane filtration (US EPA Method 1604, direct colony count) on the US dataset (n=209 Colilert, n=303 MF), and the Aquagenx Compartment Bag Test (a quantitative MPN method) in the Rwanda and Kenya field campaign. Colilert and CBT are both quantitative enumeration methods, not presence/absence tests; the validation therefore rests on enumerated reference counts throughout, exactly as the approval requires.
Independent third-party academic analysis. The Lume’s performance against these enumerated references has been analysed and reported in peer-reviewed publications carrying independent academic authorship from the University of Colorado Boulder and Colorado State University: Knopp et al. (2026, sensor design and multi-site Colilert/MF validation) and Bedell et al. (2022, Water Research, the foundational TLF–E. coli relationship). External peer review and academic co-authorship provide the independent analysis FAR 2 calls for, over and above Virridy’s own continuously published validation at validation.thelume.ai/cbt.

FAR 3 — Manual ↔ Digital Integration Resolved

Original requirement

A clear protocol must be established for integrating manual water quality sampling with the digital Lume sensor data. This should define when manual sampling is required as a complement or cross-check, and how discrepancies between manual and digital results are resolved.

3.1 Cross-check protocol

The integration protocol is implemented as a live, automated pipeline at validation.thelume.ai/cbt. It has processed 216 paired Lume–CBT field samples from Rwanda and Kenya. The protocol operates as follows:

Field sampling. A Compartment Bag Test (CBT) grab sample is collected at each site visit within ±10 minutes of a Lume sensor reading. The CBT sample is taken from the same water source as the Lume sensor. Results are recorded via the mWater “Lume 1.2 — 2026 Validation Data” datagrid with site ID, sample timestamp, sensor barcode(s), enumerator name, and timezone.
Automated pairing. The CBT page pairs each CBT sample with the lowest-fluorescence (mon2) reading within a ±10-minute window. The lowest reading is the fully-submerged, physically-correct value: briefly lifting the sensor out of the water to take the grab sample spikes its apparent fluorescence, so the minimum reading in the window is selected, and turbidity (ToF) and temperature are read at that same submerged moment. Where a sample was read by two sensors (barcode expansion), the pipeline generates one pairing per sensor. The window contains an average of ~31 readings per pair (6,855 total sensor observations).
Exclusion accounting. No statistical outlier screening is applied (no IQR fencing, no Cook's-distance removal). The lowest-fluorescence pairing rule removes out-of-water artifacts at the source, so every in-water matched point is retained. From 123 CBT samples expanded to 222 barcode-level candidate pairings, 6 were excluded (216 retained), with counts displayed transparently on the page:
- Out-of-water / missing telemetry (2): no valid submerged reading within the ±10-min window (sensor powered off, out of range, or never submerged during sampling).
- Documented data-quality exclusions (4): each excluded with written rationale (no statistical screening):
  - Cross-sensor sensor fault (1): on one clean-water grab (CBT = 0), sensor 50045 read ~4× its sister sensor 50065 in the same water (an instrument fault).
  - Turbidity-compromised readings (2): two readings at ToF 82 and 108 kcps, where the optics are unreliable and the sample is QA-flagged for CBT confirmation.
  - First-day baseline transition (1): sensor 50065’s first day in Kenya, before a local clean-water baseline was established.
Discrepancy detection. A pair is flagged as discrepant if the Lume prediction disagrees with the CBT result by more than the ±0.92 log₁₀(CFU+1) agreement band (derived from combined CBT and fluorimeter uncertainty). Flagged pairs are reviewed by the Virridy water-quality lead. Resolution paths: (a) confirmed sensor error → retraining batch and field replacement if persistent; (b) confirmed CBT error → annotate and exclude; (c) ambiguous → duplicate manual sample on next visit.
Continuous verification. Model performance (LOOCV agreement, per-sensor breakdowns, binary classifier metrics, residual plots) is recomputed on every page load from the current dataset. Any verifier can independently audit the results at any time. The discrepancy log is maintained in Appendix C.

3.2 QA evidence — reference-method variability baseline

The US multi-site validation dataset (n=153 three-way split samples: Lume, Colilert, MF) establishes the practical floor for inter-method disagreement. On these same samples, the Colilert↔MF agreement was only κ=0.40 (“fair”), while the Lume↔Colilert agreement was κ=0.88 (“almost perfect”). This means a substantial share of any Lume↔CBT discrepancy in the field reflects inherent variability between microbial water tests, not sensor error.

For the Rwanda/Kenya Phase 1 dataset (n=216 paired Lume–CBT), the CBT-trained Tobit model achieves 88% LOOCV agreement within ±0.92 log₁₀. Per-sensor agreement ranges from 85% (50045, Rwanda) to 98% (50053, Kenya). The full pair-by-pair comparison, including residual plots and per-sensor breakdowns, is available at validation.thelume.ai/cbt.

3.3 Resolution summary

The FAR 3 requirement asked for “a clear protocol for integrating manual water quality sampling with the digital Lume sensor data… defining when manual sampling is required as a complement or cross-check, and how discrepancies between manual and digital results are resolved.” Every element is addressed:

Required element	Where documented	Status
When manual sampling is required	§3.1 step 1 — CBT grab sample at every site visit, within ±10 min of sensor reading	Complete
Integration of manual & digital data	§3.1 steps 2–3 — automated pairing and exclusion pipeline at validation.thelume.ai/cbt	Complete
Discrepancy definition	§3.1 step 4 — >0.92 log₁₀(CFU+1) threshold, the combined CBT + fluorimeter agreement band	Complete
Discrepancy resolution	§3.1 step 4 — three resolution paths (sensor error, CBT error, ambiguous → duplicate)	Complete
Protocol exercised at scale	216 paired observations from 3 sensors, 2 countries, 7+ sites — 88% LOOCV agreement	Complete

The protocol is not a draft document — it is an operational, automated pipeline that has processed the full Phase 1 field dataset. Performance metrics update continuously as new paired samples are added. The discrepancy log (Appendix C) will accumulate additional entries during the separate Phase 2 permanent installation project as ongoing cross-checks are conducted; the protocol itself is fully operational and exercised.

FAR 4 — SDWS 23 & SDWS 27 Exploration Resolved

Original requirement

The project developer should explore the applicability of SDWS Parameters 23 (volume of safe water treatment) and 27 (operational days) to the dMRV solution and provide a rationale for inclusion or exclusion of these parameters in the monitoring plan.

4.1 Why these parameters are relevant

SDWS Parameter 23 (volume of safe water treatment) and SDWS Parameter 27 (operational days) are the two SDWS parameters most amenable to digital substitution by an in-line Lume sensor. The Lume's existing on-board channels — UVLED / SiPM / board temperatures and ToF turbidity — change predictably when the sensor's optical interface transitions between air-exposed, still water, and flowing water. Mapped to the methodology:

SDWS 27 (operational days) reduces to a daily binary classification: was the treatment system in active service, yes or no? The Lume's air-vs-water discrimination is the direct sensor for this — a day with sufficient sub-aquatic minutes is operational; a day spent dry is not.
SDWS 23 (water volume) reduces to a continuous time integral: volume = Σ (flowing seconds × calibrated flow rate). The Lume's flowing-vs-still discrimination is the direct sensor for this — flowing time is what gets multiplied by a per-site flow-rate calibration to recover litres dispensed.

4.2 Phase 1 bench evidence — two test setups

An end-to-end bench study built and validated a per-point flow-state classifier on Lume sensor #50051 across two distinct fixtures. The complete analysis — confusion matrices, per-class metrics, feature engineering, and reproducible snapshot data — is published at validation.thelume.ai/pipedflow/ (static snapshot 2026-04-27, 418 annotated segments, 1,599 classified data points). The underlying data is also available at piped-flow-test.pages.dev/analysis/.

Test 1 — Closed Pipe Flow (2026-04-13 → 04-16, 142 annotated segments, 696 data points): pump-driven flow through a closed pipe loop, alternating ~15 min flowing + ~45 min still per hour. Designed primarily to validate the flowing↔still discrimination that drives SDWS 23.
Test 2 — Filling/Draining Bucket (2026-04-17 → 04-27, 276 annotated segments, 903 data points): bucket dispenser cycling through fill, hold, and drain phases with intentional air exposure between fills. Designed primarily to validate the air↔water discrimination that drives SDWS 27.
Classifier: per-point KNN (k=3, distance-weighted, class-balanced) on a 7-dimensional feature space derived from sustained temperature changes across three on-board thermistors (UVLED, SiPM, board) plus the UVLED–board temperature differential. Leave-one-region-out cross-validation. Air predictions are gated by a turbidity threshold (signal_per_spad_kcps ≥ 80) so the classifier cannot call Air without optical evidence.
Sensor streams: /diagnostics (uvled_temperature, sipm_temperature, board_temperature) and /tof (signal_per_spad_kcps, distance_mm) from Lume v1.2 barcode #50051.

4.3 Headline performance against each SDWS parameter

Test setup	Primary SDWS target	Overall accuracy	Cohen's κ	Key per-class result
Closed Pipe Flow (n=696 points)	SDWS 23 (volume)	95.3%	0.89	Flowing recall 96.3%, Still recall 98.1%
Bucket Dispenser (n=903 points)	SDWS 27 (operational days)	90.7%	0.85	Air recall 96.0%, Air precision 100.0%
Combined (n=1,599 points)	—	93.0%	0.85	All three classes ≥ 85% recall

For an integral-over-time deployment metric, this corresponds to ~7% time-budget error per measurement period across both setups combined: roughly 5 min of misclassified state per 100 min on the Closed Pipe Flow rig (relevant to SDWS 23) and roughly 9 min per 100 min on the Bucket rig (relevant to SDWS 27). Both are well within the precision needed for monthly carbon-credit verification cycles.

4.4 SDWS 27 (operational days) — feasibility evidence and approach

Feasibility: the Bucket Dispenser test directly demonstrates SDWS-27-grade air-vs-water discrimination. Air precision is 100% (every Air prediction was correct — zero false-positives) and Air recall is 96% (96 of every 100 actual air-exposed minutes are correctly labelled). For a binary daily question — "did this site have water for ≥ N minutes today?" — this exceeds the precision needed to meet Gold Standard's audit requirements. The 4% of missed Air minutes are biased toward conservatism (counting borderline air-exposed periods as Still under-counts air-exposure days, never over-counts).

Proposed deployment formula:

Aggregate the Lume's per-point air-vs-water classification at daily resolution.
Define an operational day as one in which the sensor reports water for ≥ 120 minutes (configurable; baseline aligned with the DRIP FUNDI piped-system daily service-window schedule).
Per-sensor calibration of the air/water threshold during the post-install 4–6 h equilibration window (per the install-validation pattern documented in the bench-annotations study).
Cross-check against DRIP FUNDI operational and service records at every quarterly site visit.

4.5 SDWS 23 (water volume) — feasibility evidence and approach

Feasibility: the Closed Pipe Flow test directly demonstrates SDWS-23-grade flowing-vs-still discrimination. Overall accuracy is 95.3% with Flowing precision 94.0% and Still precision 96.4%. The residual error is dominated by Flowing → Still under-counts (a directionally favorable bias for a conservative volume estimate; see below). The classifier is therefore sensor-side ready for SDWS 23 estimation, conditional on per-site flow-rate calibration.

Proposed deployment formula:

Use the per-point classifier to label each Lume reading as Flowing or Still, integrated across the day to recover total flowing time.
Calibrate the dispensed flow rate once at Phase-2 install per site via a manual fill test (graduated bucket, 60 s repeated 3×).
Daily volume = Σᵢ (flowing duration_i × calibrated flow rate_site), with classifier-side error budget of ≤ 5 min per 100 min observed and a separately-reported uncertainty contribution from the flow-rate calibration repeats.
The classifier's bias is conservative: 49 of the 219 Flowing points in the Bucket test were misclassified as Still, and 6 of 162 Flowing points in the Closed test were under-counted similarly. For SDWS-23 verification this is the favorable direction (lower-bound volume estimate), but for an unbiased report a per-class recall correction matrix can be applied at the integration step.

4.6 Sensor-cadence dependency

The dominant bottleneck across both tests is sample cadence. At the snapshot rate of one sensor reading per ~6 min, 15-min Flowing windows yield only 2–3 samples per event, leaving the temperature-derivative features statistically underpowered. The straightforward operational fix is to return the firmware to 1-min sample cadence (the configuration the original 2026-04-18 closed-pipe-flow study used), which would put 15+ samples in every Flowing event and is expected to lift Flowing recall on both setups well above 95%. This will be implemented during the separate Phase 2 permanent installation project.

4.7 Recommendation for the Monitoring Plan

SDWS 23 and SDWS 27 are recommended for inclusion in the dMRV monitoring plan.

US bench evidence demonstrates classifier accuracy that meets the precision needed for both parameters, and the deployment formulas (above) reduce each to an aggregation of well-characterised per-point predictions. Final inclusion is conditional on field-validation work to be conducted under the separate Phase 2 project, which Virridy requests to site on the DRIP Kenya piped systems.

4.8 Resolution summary

The FAR 4 requirement asked the project developer to “explore the applicability” of SDWS 23 and 27 and “provide a rationale for inclusion or exclusion.” Both elements are addressed:

Required element	Where documented	Status
Explore applicability of SDWS 23	§4.5 — Closed Pipe Flow test (95.3% accuracy, κ=0.89, 696 points); deployment formula defined; conservative bias documented	Complete
Explore applicability of SDWS 27	§4.4 — Bucket Dispenser test (90.7% accuracy, κ=0.85, 903 points); Air precision 100%; deployment formula defined	Complete
Combined classifier validation	§4.3 — 1,599 points across both setups, 93% overall accuracy, κ=0.85, ~7% time-budget error	Complete
Rationale for inclusion/exclusion	§4.7 — both parameters recommended for inclusion; bench accuracy meets the precision needed for monthly verification cycles	Complete
Published evidence	validation.thelume.ai/pipedflow/ — full confusion matrices, per-class metrics, feature engineering, snapshot data (418 segments, 1,599 points)	Complete

The exploration is complete. The Lume sensor’s existing on-board channels (temperature dynamics across three thermistors + ToF turbidity) enable three-class flow-state classification at 93% accuracy on 1,599 bench data points, with per-parameter accuracy of 95.3% (SDWS 23) and 90.7% (SDWS 27). Both parameters are recommended for formal inclusion in the monitoring plan. Field deployment and site-level calibration will be conducted under the separate Phase 2 permanent installation project, which Virridy requests to site on the DRIP Kenya piped systems, since piped infrastructure is the setting where SDWS 23 (volume) and SDWS 27 (operational days) flow measurement is most directly validated. That project will generate the operational data needed to finalise per-site flow-rate calibrations and validate the deployment formulas against system operational records and manual fill records.

Phase 1 Results — Field Validation Dataset

5.1 Why two countries, two programs

Phase 1 was deliberately conducted across two independent water programs in two countries to test whether the Lume sensor and its estimation model generalise beyond a single operating context. Rwanda and Kenya differ in climate, altitude, water infrastructure, source-water chemistry, and institutional setting. A model that performs consistently across both provides stronger evidence for substitution than one validated in a single program. The full dataset, methodology, model specification, and live results are published at validation.thelume.ai/cbt.

	Rwanda — Amazi Meza	Kenya — DRIP FUNDI
Program	School-based water treatment serving ~600,000 students across 500+ schools (scaling to 1.5M by 2028). Gold Standard GS12240 — 33,911 tCO₂e issued to date.	USAID-funded drought resilience platform serving ~120,000 people across 200 boreholes in five northern Kenya counties. Sensor-based predictive maintenance raised borehole uptime from 56% to 91%.
Setting	Highland institutional sites (schools), ~1,600 m elevation, Kamonyi and Kicukiro districts	Arid/semi-arid community sites, ~500–900 m elevation, Isiolo and Turkana counties
Water sources	Spring water, stream/surface water, rainwater harvesting, piped municipal supply	Boreholes, water kiosks, public stand taps, inline chlorination systems (Aquatab)
Treatment	LifeStraw Community gravity ceramic filters	Inline chlorination (Aquatab), some untreated distribution points
Observations	96 (44%)	120 (56%)
Sensors	50045, 50065	50053, 50065
Sites	32 sampling points — EP Nyakabungo, EP Nyakabuye, EP Rwishwima (schools), plus diverse source-water test sites in Kicukiro/Kamonyi	20 sampling points — Garbatula and Ngaremara (Isiolo), Loima/Turkwel, Lokichar/Kimabur, Kakuma/Nakoyo (Turkana)
Water temp	25.0–41.7°C	25.9–42.6°C

Sensor 50065 was deployed in both countries, providing a direct within-sensor comparison across programs. Its 85% LOOCV agreement (116 paired points) demonstrates that a single physical sensor generalises across the Rwanda and Kenya operating contexts without recalibration.

5.2 Dataset composition

The Phase 1 dataset comprises 216 paired Lume–CBT observations from 52 distinct sampling points, collected May 25 – June 11, 2026. Each observation pairs a CBT grab sample with the lowest-fluorescence (mon2) sensor reading within a ±10-minute window — the fully-submerged value, since briefly lifting the sensor to sample spikes its apparent fluorescence (6,855 total sensor readings back the 216 pairs, avg ~31 per window).

Category	Count	Percentage
By country
Rwanda	96	44%
Kenya	120	56%
By WHO/UNICEF CBT category
0 CFU/100 mL (conformity)	134	62%
1–9 CFU/100 mL (low risk)	31	14%
10–99 CFU/100 mL (intermediate risk)	15	7%
≥100 CFU/100 mL (very high, right-censored)	36	17%
By water type
Source (untreated)	109	50%
Treated (post-filtration / chlorination)	107	50%

This composition is representative of WASH drinking-water monitoring: most conforming (0 CFU) samples come from functioning treatment systems, while source waters span the full contamination range up to the right-censored ≥100 CFU band. Spanning both Rwanda and Kenya and the full WHO/UNICEF risk range, the model is trained on a genuine cross-section of the water supplies it will monitor in production.

5.3 Analysis methodology

The estimation model is a right-censored Tobit regression with 8 coefficients (Appendix B). For each paired observation, the model takes z-scored baseline-subtracted fluorescence (mon2), z-scored water temperature, z-scored turbidity (ToF), and a per-sensor 2-point calibration (offset + gain: a per-sensor intercept and a per-sensor fluorescence slope), and produces a log₁₀(CFU+1) estimate. Temperature enters as an explicit predictor rather than as a pre-correction. Right-censoring at the CBT detection limit ensures the model does not hallucinate precision below the reference method’s resolution.

Model validation uses leave-one-out cross-validation (LOOCV): for each of the 216 observations, the model is retrained on the remaining 215 and predicts the held-out point. This is the most conservative cross-validation scheme — every single observation is tested against a model that has never seen it. Agreement is defined as prediction within ±0.92 log₁₀(CFU+1) of the CBT result (the δ₉₅ agreement band derived from combined CBT and fluorimeter uncertainty).

In addition to continuous estimation, the model is evaluated as a binary classifier at the ≥10 CFU/100 mL contamination threshold (the WHO “intermediate risk” boundary most relevant for WASH compliance). Balanced accuracy, sensitivity, and specificity are computed to assess detection performance independent of class prevalence.

5.4 Results

Per-sensor performance

Sensor	Country	Paired points	LOOCV agreement
50045	Rwanda	48	85% (41/48)
50053	Kenya	52	98% (51/52)
50065	Both	116	85% (99/116)
All sensors		216	88% (191/216)

Binary classifier (≥10 CFU/100 mL threshold)

Metric	Value	Interpretation
Balanced accuracy	85%	Average of sensitivity and specificity, unaffected by class imbalance
Sensitivity	76%	Probability of correctly detecting contaminated water
Specificity	93%	Probability of correctly classifying safe water
AUC	0.892	Area under the ROC curve — discrimination ability across all thresholds

This balanced accuracy reaches 90% of the ~92.5% ceiling imposed by the CBT reference method’s own inter-method variability (CBT vs. membrane filtration agreement is ~92–93% at the same threshold). The sensor is approaching the limit of what any single method can achieve against any other single method.

Across the full dataset the fluorimeter and CBT are statistically equivalent: a two one-sided test (TOST) passes at p < 0.001 (mean fluorimeter–CBT difference +0.13 log₁₀ on the 180 non-censored observations), well inside the ±0.92 log₁₀ agreement band.

WHO risk classification

Classification task	Agreement	Notes
Three-tier WHO risk (<10, 10–99, ≥100 CFU)	82% exact (177/216); 97% within ±1 category	210 of 216 within one WHO tier; only 3% are >1 tier from CBT
≥10 CFU binary (contamination screening)	85% balanced accuracy	The primary WASH compliance threshold (sens 76%, spec 93%, AUC 0.892)
≥1 CFU binary (presence/absence)	71% balanced accuracy	Cannot reliably distinguish 0 from 1–9 CFU; not recommended for zero-certification (sens 50%, spec 93%, AUC 0.761)

Chlorination detection

Among Kenya DRIP samples with free chlorine residual measured (n=66), 100% of chlorinated samples (Cl₂ > 0, n=30) had 0 CFU by CBT and 100% were classified as safe (<10 CFU) by the Lume. Across 4 matched source-vs-treated site-days, the Lume’s treated-water prediction was lower than its paired source prediction in 3 of the 4 (the exception was a low-contrast pair where both source and treated were already safe). This confirms the sensor can verify treatment system efficacy in chlorinated supplies.

Key findings

Cross-country consistency. Per-sensor LOOCV agreement ranges from 85% to 98% across three sensors deployed in Rwanda and Kenya. By program, agreement is 93% in Kenya (DRIP, 112/120) and 82% in Rwanda (Amazi Meza, 79/96). The lower Rwanda figure is not a sensor defect: Rwandan spring and surface source waters carry naturally fluorescent dissolved organic matter (DOM) that raises the tryptophan-like-fluorescence baseline independently of E. coli, producing false positives. This is an intrinsic limit of TLF sensing in high-DOM water, not a calibration failure. The model nonetheless generalises across two countries, two water programs, five water-source types, and a ~17°C temperature range (25–42°C).
Balanced detection. At the ≥10 CFU threshold the model reaches 76% sensitivity and 93% specificity (85% balanced accuracy), modestly favouring specificity so that safe water is rarely flagged as unsafe. For a monitoring substitution this is a conservative posture: false alarms are kept low while contamination is still detected in three cases out of four.
Approaching the reference-method ceiling. The Lume achieves 90% of the agreement rate between two accepted laboratory methods (CBT vs. MF). Further accuracy gains are limited by the inherent variability of the CBT reference method itself, not by the sensor.
Temporal density. A permanently installed Lume sensor generates ~288 readings per day versus a single CBT grab sample per site visit. For dMRV verification, this means water quality is monitored continuously between visits rather than assumed from periodic snapshots.
Conservative by design. The Tobit model’s right-censoring at the CBT detection limit (100 CFU) means it cannot predict contamination levels below what the reference method itself can measure. Residual errors are concentrated in the mid-range (1–10 CFU) where inter-method variability is inherently highest.

5.5 US baseline studies

US-side validation established the Lume's intrinsic measurement performance using laboratory Colilert (n=209, R²=0.881) and Membrane Filtration (n=303, R²=0.872) reference methods across multiple sites (Boulder Creek CO, Seine River FR, Yampa River CO). The US data also provides the inter-method variability baseline (κ=0.88 Lume↔Colilert vs. κ=0.40 Colilert↔MF on n=153 three-way split samples) that anchors the substitution case. Live US deployments include three Boulder Creek sensors streaming continuously to boulder-water.pages.dev.

Live, continuously updated Phase 1 results: validation.thelume.ai/cbt

Methodology Integration: Reducing Manual CBT Sampling under Gold Standard SDWS V2.0

The operational purpose of the Lume in this pilot is to serve as a validated Digital MRV instrument for the water-quality crediting parameter under the Gold Standard Emission Reductions from Safe Drinking Water Supply methodology (V2.0, GS4GG PAA M400), reducing the cost, latency, and statistical penalty of manual Compartment Bag Test (CBT) grab sampling while increasing temporal coverage. The methodology already contemplates this role explicitly; the Phase 1 validation results quantify how well the Lume fills it.

A note on methodology version (v1.0 to v2.0)

Pilot 14 was approved (3 September 2025) against SDWS v1.0, the version in force when the original Implementation Plan was submitted. SDWS v2.0 (GS4GG PAA M400) was published afterward and is the version that explicitly introduces the “Validated Sensors (Digital MRV)” instrument category and the continuous-monitoring provisions (§14.3, §14.4.3.1(c), §14.4.5) that govern exactly this use. This validation has therefore been deliberately built and presented to align with v2.0: the dMRV role the Lume performs is the one v2.0 now names and encourages. Nothing in the deviation depends on v1.0-only language; where this report cites v2.0 clauses, it is because v2.0 is both the current methodology and the one whose digital-sensor requirements this evidence is designed to meet.

1. The parameter the Lume serves (M_q,y, SDWS 21)

Emission reductions under SDWS V2.0 are credited in proportion to the water-quality modifier M_q,y (SDWS 21), defined as “the fraction of samples that pass microbial quality requirements” at the Point of Use. Crediting is directly proportional to M_q,y: every sample that fails the microbial standard reduces issued credits. The pass/fail standard is the project’s national microbiological standard or, where absent, the WHO low-risk Safe Drinking Water definition of <10 CFU E. coli/100 mL (Table 7.10, WHO 2022). This is exactly the operating point the Lume model is validated against (the ≥10 CFU/100 mL classifier).

2. The methodology explicitly permits validated sensors

The Lume does not require a methodology exception. The SDWS 21 monitoring table lists its eligible measuring instruments as “Accredited laboratory equipment OR Validated Field Testing Kits (CFU or MPN methods) OR Validated Sensors (Digital MRV)” (§14.3). Section 14.4.3.1(c) is more direct: “Validated sensor-based approaches for direct monitoring are permitted and encouraged if adherence to standardized validation and QA/QC protocols (e.g., regular calibration, data validation procedures) is demonstrated and validated by the VVB.”

The Phase 1 deliverables are built to satisfy exactly that validation bar:

Methodology requirement (§14.4.3.1c)	Phase 1 evidence
Sensor validated against an accepted reference	216 paired Lume–CBT observations; 88% leave-one-out agreement within the combined ±0.92 log₁₀ measurement uncertainty; TOST equivalence to CBT (p<0.001)
Performance at the crediting threshold	≥10 CFU/100 mL classification: 85% balanced accuracy, 93% specificity, AUC 0.892 = 90% of the empirical CBT–Colilert inter-method ceiling (92.5%)
Regular calibration	Per-sensor two-point calibration (intercept offset + fluorescence gain), documented in Appendix B; recalibrated against periodic CBT
Data-validation procedures	Automated turbidity QA flagging (ToF out-of-range → routed to CBT confirmation); four documented data-quality exclusions published with rationale
Transparent, auditable model	Eight published coefficients (no AI / ML / black-box); fully reproducible by the VVB from the published model specification

3. Why continuous sensing is materially better than annual grab sampling

The methodology requires M_q,y to be measured on a representative sample (minimum 30) satisfying the 90/10 precision rule (90% CI, 10% margin of error), tested annually. Critically, if the 90/10 rule is not met, the Lower Bound of the 90% confidence interval shall be used (§7.4.1), a direct and conservative reduction in issued credits. A manual annual CBT campaign on a small grab sample is exactly the regime that risks failing 90/10 and triggering this penalty.

Continuous Lume monitoring inverts this. Each installed unit produces roughly 288 in-situ readings per day (a ~5-minute cadence), so the annual sample size underpinning M_q,y is orders of magnitude larger than any feasible manual campaign. The 90/10 rule is satisfied with wide margin, the lower-bound penalty is avoided, and the developer captures the full creditable pass fraction rather than a discounted estimate. Three concrete advantages follow:

Credit integrity and yield: M_q,y is computed from a near-census of the water actually delivered across the year, not a single annual snapshot, eliminating the lower-bound CI penalty.
Near-real-time corrective action: SDWS 21 mandates a corrective-action plan when the failure rate exceeds 20% (Year 1), 15% (Year 2), or 10% (Year 3+). Continuous monitoring surfaces a rising failure fraction as it happens, rather than retrospectively at the annual test, protecting both public health and the credit stream.
Cost and logistics: reagent, incubation, transport, and trained-staff time for the bulk of routine monitoring are removed; the Lume runs unattended and transmits time-stamped, tamper-evident records.

4. Proposed hybrid architecture (replace the routine, retain the reference)

The Lume reduces manual CBT rather than eliminating it, in a structure that maps directly onto the methodology’s own dual-testing and validation requirements:

Continuous Lume at the Point of Use (and, for CWT/CWS, concurrently at the Point of Collection per the mandatory dual-testing requirement §14.4.2) computes M_q,y as the continuous fraction of readings below the 10 CFU/100 mL standard.
Periodic CBT cross-validation on a small audit set provides the “parallel testing” the methodology requires for any kit or sensor (§14.4.3.1b–c), recalibrates each sensor’s offset and gain, and anchors the VVB validation. This is where manual sampling is reduced: from a large representative annual campaign to a small periodic calibration and verification set.
Confirmatory CBT on alarm: when continuous readings cross the 10 CFU/100 mL threshold, or the running failure fraction approaches the corrective-action limits (20% / 15% / 10% for Years 1 / 2 / 3+), a confirmatory CBT is taken before the result is applied to M_q,y or a corrective action is triggered, so that a transient or false-positive reading does not erroneously reduce credits or prompt unnecessary intervention.
Turbidity QA routing: readings flagged by the onboard turbidity channel (the known optical limitation, see Findings) are automatically excluded from the sensor M_q,y and routed to CBT confirmation, so the residual weakness is handled transparently rather than silently credited.
Continuous-data governance follows §14.4.5: gaps under 15 days are interpolated from surrounding data; gaps over 15 days are excluded from crediting unless supported by audited logs.

5. What the validation honestly supports

M_q,y is, by construction, a confirmation measure: the fraction of samples that pass. The Lume’s measured strengths align precisely with this need. At the 10 CFU crediting threshold it has 93% specificity and a 93% negative predictive value: when the Lume reports a reading below the safe-water threshold, it agrees with CBT 93% of the time. Across matched chlorination comparisons, 100% of chlorinated samples (n=30) were correctly classified as safe. For the pass-fraction parameter the methodology actually credits, the Lume is a high-confidence instrument.

The honest limitation (detailed under Findings) is sensitivity to low-fluorescence contamination, most acute in high-DOM Rwandan spring water, where some contaminated samples do not fluoresce. This is why the architecture above retains periodic and on-alarm CBT confirmation and turbidity routing rather than proposing the Lume as a standalone contamination detector. The methodology itself requires this reference cross-check for every sensor and field kit; the Lume’s contribution is to collapse the volume and frequency of that manual reference work while raising temporal coverage from one snapshot per year to continuous.

6. Net effect on the SDWS monitoring workflow

SDWS 21 (M_q,y) monitoring	Manual CBT only	Lume Digital MRV + reduced CBT
Sample size behind M_q,y	≥30 grabs per stratum per year	Continuous (~288/day/site)
90/10 precision rule	At risk; lower-bound CI penalty if missed	Satisfied with margin; no penalty
Temporal resolution	One annual snapshot	Continuous, time-stamped
Failure / corrective-action detection	Retrospective (at annual test)	Near-real-time
Manual CBT effort	Full representative campaign	Small periodic calibration / verification set
Audit trail for VVB	Lab reports per campaign	Continuous tamper-evident sensor record + calibration CBT

Bottom line for the methodology. SDWS V2.0 already names “Validated Sensors (Digital MRV)” as an eligible instrument for the water-quality crediting parameter and encourages their use. The Phase 1 results show the Lume meets that bar at the exact <10 CFU/100 mL crediting threshold (85% balanced accuracy, 93% specificity, 90% of the CBT inter-method ceiling), with a transparent eight-coefficient model and a documented calibration and QA protocol. Deployed as a continuous monitor with periodic CBT calibration and confirmatory CBT on alarm, it replaces the routine annual CBT campaign for M_q,y, avoids the 90/10 lower-bound credit penalty, and adds near-real-time corrective-action capability, while retaining the reference cross-check the methodology requires. On this basis, Virridy requests that Gold Standard recognise this validation as qualifying the Lume as a Validated Sensor (Digital MRV) under SDWS V2.0 (§14.3, §14.4.3.1(c)) for use across SDWS projects operating within the validated water-type envelope — the source and treated water types, fluorescence, turbidity, and temperature ranges represented in the Phase 1 dataset — rather than for the Amazi Meza program alone. Projects whose conditions fall outside that envelope would extend coverage by contributing paired CBT data under the §2.4 retraining protocol, with each resulting model version validated by the VVB.

Phase 2 — Permanent Installation (Separate Project)

Separate project and validation effort

Phase 2 will place permanent Lume installations at a minimum of 20 DRIP FUNDI sites in Kenya (target 30–50 sensors) for long-term installation validation. Virridy requests that this Phase 2 validation be conducted on the DRIP FUNDI piped systems: because these systems are piped, they are the appropriate setting to validate the permanent-installation configuration and the SDWS 23 (volume) and SDWS 27 (operational days) flow parameters, which require flow measurement on piped infrastructure. It will be conducted as a separate project with its own work plan, timeline, and reporting, and will generate operational data for SDWS 23/27 field deployment (site-level flow-rate calibrations and deployment formula validation) and ongoing CBT cross-check data. This report covers Phase 1 mobile validation only; all four FARs are resolved based on Phase 1 evidence.

On the “statistically random and valid sample of installed systems” condition

The pilot approval scopes the deviation to monitoring “a statistically random and valid sample of installed water treatment systems.” That sampling design governs how installed systems are selected for crediting at the Verification (credit-issuance) stage, and is specified ahead of first verification. It is distinct from, and downstream of, the Phase 1 mobile instrument-accuracy validation presented in this report, which establishes that the Lume measures E. coli reliably against enumeration-based reference methods across a representative range of water conditions. Virridy has not yet entered verification or sought credit issuance; the random-sample-of-installed-systems design will accompany the first verification, not this validation round.

Planned Phase 2 reporting (pre-populated structure)

Metric	Aggregation	Target	Result
Sensor uptime	% of expected check-ins received	≥95%	TBD
Calibrated-combo coverage	% of readings at `led=512, bias∈[2960,3040]`	≥90%	TBD
Battery longevity	Median V/week drift	<0.05 V/week	TBD
CBT cross-check rate	Paired samples per site per quarter	≥3	TBD
Discrepancy rate	% of CBT pairs >1 WHO band off	≤15% (matched to Colilert↔MF baseline)	TBD
Operational-day coverage (SDWS 27)	Days/site with ≥120 min in-water flag	≥28 / 30 days	TBD

Findings, Limitations & Recommendations

Findings

The Lume sensor is a valid digital substitute for periodic CBT sampling under SDWS Parameter 18. The sensor agrees with CBT field results at 88% LOOCV, 85% balanced accuracy (AUC=0.892), and 97% within-one-category WHO risk-tier agreement — validated on 216 paired samples from 52 sampling points across Rwanda and Kenya (validation.thelume.ai/cbt). Against laboratory Colilert, agreement is κ=0.88 — stronger than the κ=0.40 agreement between Colilert and Membrane Filtration on the same samples. The sensor meets or exceeds the measurement agreement that the water-quality testing community already accepts between reference methods.
The model generalises across two countries, two programs, and multiple water-source types. Rwanda (Amazi Meza, school-based filtration, highland) and Kenya (DRIP FUNDI, community boreholes and chlorination, arid) represent a genuine cross-section of the operating contexts the sensor will encounter. Per-sensor agreement is 85–98% across both programs with no country-specific tuning, and program-level agreement is 93% in Kenya and 82% in Rwanda, where naturally fluorescent dissolved organic matter (DOM) in spring and surface source waters raises the TLF baseline and accounts for most residual error. Sensor 50065, deployed in both countries, achieves 85% LOOCV on 116 points spanning both contexts.
Continuous monitoring replaces point-in-time snapshots. A CBT provides one data point per site visit. A permanently installed Lume sensor generates ~288 readings per day — 6,855 sensor observations backed the 216 paired CBT comparisons in Phase 1. For dMRV verification, this means water quality is monitored continuously between site visits rather than assumed from periodic grab samples.
Chlorination efficacy is independently confirmed. Among Kenya DRIP samples with free chlorine residual measured, 100% of chlorinated samples (n=30) had 0 CFU by CBT and 100% were classified as safe by the Lume. The sensor can verify that treatment systems are functioning, not just that water quality meets a threshold.
The estimation model is fully transparent and independently verifiable. By explicit design choice, the deployed model is a right-censored Tobit regression on eight parameters (fluorescence, water temperature, turbidity, plus a per-sensor 2-point calibration: a per-sensor intercept offset and a per-sensor fluorescence slope, i.e. offset + gain) — no AI, no ML ensemble, no neural network. The full coefficient set (8 values, Appendix B) is the entire model. Any third party can reproduce the sensor’s output from raw readings with a calculator.
SDWS 23 (water volume) and SDWS 27 (operational days) are feasible extensions. Flow-state classification validated on 1,599 bench data points (93% accuracy, κ≥0.85) at validation.thelume.ai/pipedflow/. Both parameters recommended for inclusion; field deployment under the separate Phase 2 project.

Conclusion

Based on the evidence presented in this report, Virridy recommends that the Lume sensor be approved as a digital substitute for periodic CBT sampling for SDWS Parameter 18 (Microbial Drinking Water Quality) under Gold Standard Pilot 14. The evidence base comprises:

216 paired Lume–CBT field samples from 52 sampling points across two countries and two independent water programs, achieving 88% LOOCV agreement, 85% balanced accuracy (AUC=0.892), and 97% within-one-category WHO risk-tier agreement.
Inter-method parity: Lume↔Colilert κ=0.88 vs. Colilert↔MF κ=0.40 on 153 three-way split samples — the sensor agrees with a laboratory method more than two laboratory methods agree with each other.
A transparent, auditable model — 8 published coefficients reproducible by hand, with no AI/ML dependencies.
An operational integration protocol at validation.thelume.ai/cbt that continuously pairs, validates, and flags discrepancies between sensor and manual data.
~288 readings per sensor per day vs. a single CBT grab sample per site visit, providing continuous temporal coverage for verification.

CBTs are retained as periodic cross-checks to validate ongoing sensor accuracy, not as the primary monitoring instrument. All four Forward Action Requirements are resolved. The complete evidence base is continuously available at validation.thelume.ai/cbt.

Beyond approval for the Amazi Meza program, Virridy requests that Gold Standard recognise this validation at the instrument level — qualifying the Lume as a Validated Sensor (Digital MRV) under SDWS V2.0 for use across SDWS projects that operate within the validated water-type envelope, rather than for Amazi Meza alone. Projects outside that envelope would extend coverage through additional paired CBT sampling and VVB-validated retraining under §2.4. This positions the Phase 1 validation as a reusable instrument qualification for the SDWS methodology, with Amazi Meza as the demonstration deployment. Virridy recognises that general (non-pilot) applicability is granted through the dMRV Programme’s formal methodology-revision procedure (dMRV Requirements §3.1–§3.2), and requests that this instrument-level qualification be taken up as the explicit next step along that pathway.

Limitations

Geographic. The CBT-trained Tobit model is validated on field data from two countries (Rwanda + Kenya), two programs, five water-source types, and 52 sampling points. This cross-section is broad but not exhaustive. Extension to new geographies or water types not yet represented (e.g., high-turbidity surface water, saline groundwater) will require additional paired sampling and potential model retraining per §2.4.
Sample size by category. The 3-category (safe / intermediate / high) classifier reaches 82% exact agreement (177/216) and 97% within one category (210/216); most of the residual disagreement is in the intermediate (10–99 CFU) band, which has only 15 samples and where lab reference methods themselves disagree.
SDWS 23/27. Bench-level proof-of-method only; no field accuracy numbers yet.
Reference-method floor. Colilert↔MF agreement on the same samples is κ=0.40, so a portion of any Lume↔CBT discrepancy reflects the inherent variability between any two microbial water tests rather than sensor error.

Recommendations

Extend pilot recognition to the Kenya DRIP FUNDI programme for mobile use. The Phase 1 dataset already includes Kenya field validation (program-level agreement 93%); Virridy requests that the Pilot 14 scope be formally extended to recognise the Lume for mobile validation use in Kenya DRIP FUNDI, not Rwanda alone.
Initiate the separate Phase 2 permanent installation project, which Virridy requests to site on the DRIP FUNDI piped systems in Kenya, to generate operational SDWS 23/27 deployment data (site-level flow-rate calibrations, deployment formula validation) and ongoing CBT cross-checks.
Lock the model version at the current Tobit coefficients for the duration of the verification window unless a documented retraining trigger fires; any mid-window model change requires an audit-trail entry.
Maintain the Colilert↔MF baseline (κ=0.40 / 1-WHO-band tolerance) as the formal discrepancy threshold context when interpreting any Lume↔CBT disagreements, as documented in §3.2.

Appendices

Appendix A — Live data references

thelume.ai/research — full validation evidence, confusion matrices, regression plots, training-set provenance.
thelume.ai/validation — live mWater field-validation datagrid (currently 4 entries; auto-refreshes on page load).
thelume.ai/inventory/fleet — internal Fleet Health dashboard (login required) with per-sensor calibration combo, drift, and uptime telemetry.
piped-flow-test.pages.dev/analysis/ — static snapshot (2026-04-27) of the SDWS 23 / 27 classifier results, with confusion matrices and per-class metrics for both test setups. Live tool at piped-flow-test.pages.dev.
lume-bench-annotations.pages.dev — long-running bench data (TLF, ToF, temperature) with operator-annotated air / still / twist-shake conditions.
boulder-water.pages.dev — public Boulder Creek dashboard (City of Boulder Utilities × CU Boulder × Virridy) showing live sensor data.

Appendix B — Model coefficients (verifier reference)

Authoritative source: live validation page at validation.thelume.ai/cbt.

CBT-trained Tobit model (deployed for dMRV verification)

Component	Parameter	Value
Tobit regression: log₁₀(CFU+1)	intercept	0.828
	mon2 (z-scored, baseline-subtracted fluorescence)	1.976
	temp (z-scored water temperature)	0.173
	tof (z-scored turbidity proxy)	0.006
	FE·50045 (per-sensor intercept, 2-point calibration offset)	0.453
	FE·50053 (per-sensor intercept, 2-point calibration offset)	−0.589
	slope·50045 (per-sensor mon2 slope, 2-point calibration gain)	−0.191
	slope·50053 (per-sensor mon2 slope, 2-point calibration gain)	−1.557
Tobit σ̂	—	0.606 log₁₀
Ridge regularization	λ	0.1
Right-censoring point	—	log₁₀(101) ≈ 2.004 (CBT ≥100 CFU band)
Preprocessing	standardization	per-sensor clean-water baseline subtraction for mon2, then z-score standardization of mon2, temp, and tof. Temperature is an explicit predictor (no ρ, no temperature pre-correction). Turbidity (tof) and temperature are read at the matched submerged (lowest-fluorescence) moment.
Fit quality	in-sample / LOOCV	R² = 0.519 / 0.469; MAE = 0.396 / 0.415 log₁₀
Reference sensor	—	50065 (per-sensor intercepts and slopes are relative to this reference; 50065 is the most representative unit, with the most paired points and deployment in both countries)

Turbidity (NTU) regression — bench calibration

Model	Coefficient	Value
Turbidity (NTU) regression — bench, sensor 50031	intercept (absolute, single-sensor)	−145.89
Turbidity (NTU) regression — bench, sensor 50031	slope (transfers across sensors)	2.0488

Note: The absolute NTU intercept is sensor-specific (calibrated on bench unit 50031). Field deployments use a sensor-relative anomaly form: NTU = max(0, 2.05 × (sps − per-sensor-baseline)), where the baseline is the rolling 10th-percentile of in-water sps for that unit.

Appendix C — Discrepancy log

Rolling table of all (Lume, CBT) pairs flagged as discrepant under the FAR 3 protocol (>1 WHO risk band disagreement), with resolution status.

Date	Sensor	Site	CBT result	Lume prediction	Discrepancy	Resolution
May–Jun 2026	50045	Rwanda / Amazi Meza	0 CFU (conformity)	~4× sister sensor 50065 in same water	Excluded from dataset (documented sensor fault)	On a clean-water grab, 50045 read roughly four times its sister sensor 50065 in identical water — an instrument fault, not a true contamination signal. Excluded with written rationale; this is the only manual override in the dataset (see CBT page exclusion table)

No other pairs from the 216-observation Phase 1 dataset have been flagged as discrepant under the >1 WHO risk band threshold, and no statistical outlier screening (IQR fencing or Cook's distance) was applied: the lowest-fluorescence pairing rule removes out-of-water artifacts at the source. Additional entries will be added during the separate Phase 2 project as ongoing cross-check pairs are collected.

Appendix D — Sensor inventory & chain of custody

Barcode	Program	Deployment sites	Active since	Paired CBT samples	Status	Replacement events
50045	Rwanda / Amazi Meza	EP Nyakabungo, EP Nyakabuye, EP Rwishwima, Kicukiro, Kamonyi	May 2026	48	Active	None
50053	Kenya / DRIP	Isiolo (Garbatula, Ngaremara), Turkana (Loima, Turkana South, Turkana West)	May 2026	52	Active	None
50065	Both programs	Rwanda + Kenya sites (rotated across both programs)	May 2026	116	Active	None

All three sensors operate at calibration point led_power=512, sipm_bias ∈ [2960, 3040]. No sensor replacements have been required during Phase 1. Chain of custody is maintained via Blues Notecard check-in telemetry with per-device cryptographic signatures.

Appendix E — Document version history

Version	Date	Author	Changes
0.1 (Draft)	2026-04-27	Virridy	Initial draft. US-pilot evidence populated for FARs 1, 2, 4.3; Rwanda / Phase-2 sections marked TBD.
0.2	2026-06-14	Virridy	Phase 1 complete. All four FARs resolved. Executive summary rewritten with substitution case. CBT field validation results (n=216 paired samples, 88% LOOCV; eight-parameter Tobit, lowest-fluorescence pairing). SDWS 23/27 exploration complete (1,599 bench data points, 93% accuracy). Phase 2 scoped as separate project.

Pilot Report &FAR Resolution Evidence

Executive Summary

FAR resolution status (current snapshot)

Pilot Implementation Status

What has been deployed

Deviations from the approved plan

Sensor and site inventory snapshot

FAR 1 — Sensor Validation & Calibration Resolved

Original requirement (Pilot 14 approval, Sep 2025)

1.1 Paired sensor–reference comparison evidence

1.2 Performance by WHO risk category

1.3 Calibration protocol

1.4 Drift monitoring

1.5 CBT field validation evidence (Phase 1, May–June 2026)

1.6 Resolution summary

FAR 2 — Model Documentation (Linear Regression, no AI/ML) Resolved

Original requirement (Pilot 14 approval, Sep 2025)

2.1 Model card

2.2 Training data provenance

2.3 Cross-validation results (CBT Tobit model)

2.4 Procedures for retraining and version control

2.5 Model adaptation status

2.6 Resolution summary

2.7 Independent analysis against laboratory-enumerated samples

FAR 3 — Manual ↔ Digital Integration Resolved

Original requirement

3.1 Cross-check protocol

3.2 QA evidence — reference-method variability baseline

3.3 Resolution summary

FAR 4 — SDWS 23 & SDWS 27 Exploration Resolved

Original requirement

4.1 Why these parameters are relevant

4.2 Phase 1 bench evidence — two test setups

4.3 Headline performance against each SDWS parameter

4.4 SDWS 27 (operational days) — feasibility evidence and approach

4.5 SDWS 23 (water volume) — feasibility evidence and approach

4.6 Sensor-cadence dependency

4.7 Recommendation for the Monitoring Plan

4.8 Resolution summary

Phase 1 Results — Field Validation Dataset

5.1 Why two countries, two programs

5.2 Dataset composition

5.3 Analysis methodology

5.4 Results

Per-sensor performance

Binary classifier (≥10 CFU/100 mL threshold)

WHO risk classification

Chlorination detection

Key findings

5.5 US baseline studies

Methodology Integration: Reducing Manual CBT Sampling under Gold Standard SDWS V2.0

1. The parameter the Lume serves (Mq,y, SDWS 21)

2. The methodology explicitly permits validated sensors

3. Why continuous sensing is materially better than annual grab sampling

4. Proposed hybrid architecture (replace the routine, retain the reference)

5. What the validation honestly supports

6. Net effect on the SDWS monitoring workflow

Phase 2 — Permanent Installation (Separate Project)

Planned Phase 2 reporting (pre-populated structure)

Findings, Limitations & Recommendations

Findings

Conclusion

Limitations

Recommendations

Appendices

Appendix A — Live data references

Appendix B — Model coefficients (verifier reference)

CBT-trained Tobit model (deployed for dMRV verification)

Turbidity (NTU) regression — bench calibration

Appendix C — Discrepancy log

Appendix D — Sensor inventory & chain of custody

Appendix E — Document version history

Pilot Report &
FAR Resolution Evidence

1. The parameter the Lume serves (M_q,y, SDWS 21)