Lume – Method Calibration

Four-Panel Method Comparison

How does the Lume compare against the two EPA-approved laboratory methods — Colilert (IDEXX) and membrane filtration (MF) — and against itself when retrained on a different reference? Each column analyzes paired samples across three frameworks: log-log regression (top), Bland-Altman agreement (middle), and categorical classification (bottom).

Four-panel comparison: MF vs Colilert, Lume vs Colilert, Lume vs MF (Colilert-trained), Lume vs MF (MF-trained)

Column 1 · MF vs. Colilert (n = 153)

The dedicated method comparison study pairs Colilert (n = 2 replicates) with membrane filtration (n = 3 replicates) across 161 datetimes; 8 zero-valued pairs are excluded from the log-scale analysis, yielding 153 observations. The two EPA-approved methods show R² = 0.572 with a +0.35 log₁₀ bias — MF systematically reads ~2.2× higher than Colilert. 95% limits of agreement span [−0.64, +1.34], meaning paired lab samples can differ by up to ~22× in either direction. Categorical accuracy is 0.66 (Cohen’s κ = 0.40), i.e. “fair” agreement. This inter-method disagreement sets the ceiling for what any sensor can be expected to achieve against either reference.

Column 2 · Lume vs. Colilert (n = 209, Colilert-trained)

The Colilert-trained Lume regression is evaluated against Colilert across all bench (n = 176) and field (n = 33) observations. The sensor achieves R² = 0.881, a bias of 0.00 log₁₀, and tight limits of agreement [−0.42, +0.42] — Lume predictions stay within ~2.6× of the reference. Categorical accuracy is 0.89 with κ = 0.88, which is “almost perfect” agreement. Against its training reference, the Lume performs as well as or better than the two EPA methods perform against each other.

Column 3 · Lume vs. MF (n = 173, Colilert-trained)

The same Colilert-trained Lume model is now evaluated against membrane filtration — a reference method it was never trained on. Performance drops to R² = 0.514 with LoA [−0.80, +0.83] and categorical accuracy 0.84 (κ = 0.65). Critically, the ~0.37 drop in R² from column 2 to column 3 is of the same order as the inter-method disagreement between Colilert and MF themselves (column 1, R² = 0.572). Most of the apparent loss is attributable to reference-method disagreement, not sensor limitations.

Column 4 · Lume vs. MF (n = 303, MF-trained)

To isolate the effect of reference-method choice, the Lume regression is refit using MF as the training target, over the full bucket dataset. Performance jumps back to R² = 0.872 — essentially matching the Colilert-trained model against Colilert. Bias is 0.00 with LoA [−0.93, +0.93]; the slightly wider LoA reflects the higher within-method variability of MF replicates (57.9% RPD vs. 43.5% for Colilert), not a sensor deficiency. Categorical accuracy is 0.81 (κ = 0.66).

Headline Finding

Sensor-to-reference agreement is bounded by reference-method reproducibility, not by Lume hardware. Whichever culture method is adopted as truth, the Lume fits it at R² ≈ 0.87–0.88. The gap between columns 2 and 3 is almost exactly the disagreement between the two lab methods themselves (column 1). The Lume is method-agnostic; its ceiling is set by the reference it is trained against, and it already achieves quantitative performance at or above the inter-method agreement ceiling between the two accepted laboratory techniques — while providing continuous temporal coverage that grab-sample laboratory methods cannot.

Field Calibration Data

Live data from the mWater Lume 1.2 – 2026 Validation Data datagrid. Each water sample collection event is paired with a reference enumeration; the Method column distinguishes which reference was used — Colilert (IDEXX defined-substrate MPN), membrane filtration (MF, CFU), or compartment bag tests (CBT, MPN). The Use in Calibration column flags rows that are unusable because no /diagnostics record (water temperature, required by the CFU regression) was streaming within ±20 min of the sample.

Loading validation data…

Observed vs. Predicted E. coli — Colilert

Each point is a paired grab sample where the reference assay was Colilert (IDEXX) — observed Colilert result (x-axis) vs. the sensor model prediction at the nearest reading within ±20 min of sample collection (y-axis). Points are color-coded by sensor barcode. The dashed line is the 1:1 reference.

Loading sensor data for predictions…

Observed vs. Predicted E. coli — CBT

Each point is a paired grab sample where the reference assay was a compartment bag test (CBT) — observed CBT result (x-axis) vs. the sensor model prediction at the nearest reading within ±20 min of sample collection (y-axis). The dashed line is the 1:1 reference. Pairs accumulate as Kigali field deployments scale.

Loading sensor data for predictions…

Turbidity Calibration — ToF → NTU

Bench calibration of the Lume's Time-of-Flight (ToF) photon-backscatter signal against a step-series of formazin turbidity standards. Sensor 50031, 2025-12-15, in the standard production sweep configuration (led_power = 512, sipm_bias ∈ [2980, 3020]). Each cal step held a known NTU value for ~15 minutes; the table below shows the median ToF reading per step.

Calibration data

Step start (MST)	Step end	NTU	n (ToF)	distance_mm	signal_rate_kcps	signal_per_spad_kcps
13:33	13:48	0.00	15	23	9,632	70
13:49	14:04	0.20	15	23	9,672	71
14:06	14:21	1.89	15	23	9,736	71
14:24	14:39	6.70	15	24	10,056	74
14:42	14:57	13.20	15	25	9,864	80
15:03	15:18	58.60	15	29	9,832	103
15:22	15:35	109.00	13	31	9,688	122

Fit and chart

A linear OLS fit of the median signal_per_spad_kcps against the seven calibration NTU values gives a clean monotonic relationship across the full 0–109 NTU range:

NTU = max(0, −145.89 + 2.0488 × signal_per_spad_kcps)
R² = 0.9901 · RMSE = 3.84 NTU · n = 7 cal points (140 individual ToF readings)

The intercept implies a clean-water baseline of ~71 kcps per SPAD (matches the three lowest cal steps directly), with each additional kcps/SPAD contributing roughly 2.05 NTU.

ToF signal-per-SPAD vs NTU calibration regression

Why this metric, not the others

signal_per_spad_kcps — per-detector-normalized photon backscatter rate. R² = 0.99. Physically the right proxy: scattering off suspended particles raises the per-SPAD return rate.
signal_rate_kcps (total) — R² = 0.02. Total counts are dominated by laser power and active SPAD count, not by scattering.
distance_mm — monotonic with NTU but coarse (range 23–31 mm) and at the floor of the sensor's dynamic range.
Log-log fits on either signal — worse than linear because three of seven cal points sit at or near the LOD baseline.

Important configuration note. The calibration was originally collected with the bench protocol set to sipm_bias=3300, led_power=64, but the firmware ran a full bias sweep through the production combo (led=512, sipm_bias~3000) on every measurement event, and the ToF sensor takes one reading per sweep cycle independent of the SiPM. Filtering to the production combo by timestamp recovers exactly the same 140 ToF readings used in the fit above — so this calibration is directly applicable to the live deployments.

Lab Calibration Data

Controlled bucket test data pairing Lume sensor readings with Colilert (IDEXX, MPN) and membrane filtration (MF, CFU) reference counts across known E. coli concentrations. CBT calibration data is captured in the field-calibration table above.
⬇ Download full dataset (CSV)

Loading lab validation data…

Lume 1.2 – Method Calibration

Four-Panel Method Comparison

Column 1 · MF vs. Colilert (n = 153)

Column 2 · Lume vs. Colilert (n = 209, Colilert-trained)

Column 3 · Lume vs. MF (n = 173, Colilert-trained)

Column 4 · Lume vs. MF (n = 303, MF-trained)

Headline Finding

Field Calibration Data

Observed vs. Predicted E. coli — Colilert

Observed vs. Predicted E. coli — CBT

Turbidity Calibration — ToF → NTU

Calibration data

Fit and chart

Why this metric, not the others

Lab Calibration Data