Temporal Equivalence Principle: A Standard-Siren Test of Bi-Metric Gravitational-Wave Propagation

Smawfield, Matthew Lukin; Smawfield, Matthew Lukin

doi:10.5281/zenodo.20572696

Abstract

Standard ΛCDM assumes that gravitational waves and electromagnetic radiation propagate through the same effective distance-redshift relation. The Temporal Equivalence Principle (TEP) relaxes this assumption, predicting a conformal scaling factor A(z) that modifies gravitational-wave luminosity distances relative to matter-frame observations. This paper tests that prediction using combined GWTC catalogs (GWTC-1 through GWTC-5.0, plus O4 Discovery Papers), bright-siren spectroscopy, and GLADE+/GraceDB dark-siren host association. The locked lab-scale model uses A(z) = exp(βφ₀(1+z)ⁿ) with β = −1 and φ₀ = −0.013 (dimensionless, φ₀ = φ/M_Pl), so A(z) rises above unity with growing redshift-dependent amplitude. With corrected per-event distance uncertainties and hierarchical Bayesian host marginalization, the pipeline identifies 51 events with truly independent host-galaxy redshifts (50 from GLADE+/DESI plus the bright siren GW170817) and 56 events with GWOSC-catalog fallback redshifts. The primary analysis excludes fallback redshifts to avoid ΛCDM circularity: the independent-only sample gives ΛCDM H₀ = 59.8 km/s/Mpc and lab-fixed TEP H₀ = 60.8 km/s/Mpc (Δχ² ≡ χ²_ΛCDM − χ²_TEP = −0.13, |ΔBIC| < 2). A secondary full-sample diagnostic (107 events, including fallback redshifts) gives ΛCDM H₀ = 64.6 km/s/Mpc and TEP H₀ = 65.8 km/s/Mpc (Δχ² = −0.16). The joint MCMC fit to (H₀, φ₀, n, β) with 64 walkers × 10000 steps gives posterior mean H₀ = 61.8 ± 6.7 km/s/Mpc, φ₀ = −0.025 ± 0.014, n = 1.49 ± 0.86, β = 1.15 ± 2.62; the lab-calibrated values (φ₀ = −0.013, n = 1.0, β = −1.0) are consistent within 1σ. The corresponding ΔBIC = −13.3 favors ΛCDM because the TEP joint model adds three parameters for only modest χ² improvement. The direct H₀ fit shows a marginal +1.0 km/s/Mpc upward shift in the bi-metric direction, but the redshift-dependent matched-filter residual test yields a structure opposite to the locked TEP prediction, and the χ² comparison is statistically neutral. The current sample is underpowered to resolve the predicted redshift-dependent signature.

Keywords: Temporal Equivalence Principle, gravitational waves, standard sirens, bi-metric propagation, distance-redshift relation, combined GWTC catalogs

1. Introduction

1.1 Bi-Metric Propagation and the Hubble Tension

The standard ΛCDM model predicts a single value for the Hubble constant H₀. Measurements from the cosmic microwave background (Planck, H₀ ≈ 67.4 km/s/Mpc) and the local distance ladder (SH0ES, H₀ ≈ 73 km/s/Mpc) differ by approximately 5σ. Paper 11 demonstrates that this tension is resolved within the TEP framework via environment-dependent Cepheid clock bias, yielding a corrected local H₀ = 68.17 km/s/Mpc consistent with Planck. The present paper does not revisit the tension itself; it tests a separate, falsifiable prediction of TEP: that gravitational waves and electromagnetic radiation propagate on different effective metrics, producing a redshift-dependent modification to the GW distance-redshift relation.

This modification is not exactly degenerate with a constant H₀ shift or with dark energy over a sufficiently broad redshift range, because it predicts a particular functional form for how GW luminosity distances deviate from the ΛCDM expectation as a function of redshift. Over the limited redshift baseline of the current sample and with large GW distance errors, partial degeneracy with H₀ can remain significant. Rather than adding another H₀ measurement, the question is whether the GW data prefer TEP's bi-metric scaling over the standard single-metric relation.

1.2 The TEP Bi-Metric Prediction

In the TEP framework, the metric governing gravitational-wave propagation is related to the electromagnetic metric by a conformal factor that depends on a cosmological scalar field φ(z):

\begin{equation} d_L^{(\text{TEP})}(z) = A(z) \, d_L^{\Lambda\text{CDM}}(z; H_0) \end{equation}

where the redshift-dependent conformal factor is

\begin{equation} A(z) = \exp\!\bigl[\beta_A\,\phi(z)/M_{\rm Pl}\bigr], \qquad \phi(z) = \phi_0 (1+z)^n . \end{equation}

In the lab-fixed test, the parameters φ₀ and n are not free: φ₀ follows from the locked 2025 lab-scale convention (the NIST/BIPM G discrepancy, Paper 21), and n ≈ 1 follows from the matter-density scaling of the scalar field. Throughout this paper φ₀ denotes the dimensionless ratio φ/M_Pl. The dimensionless conformal coupling is β_A = −1, with sign fixed by the same convention used in Paper 11: the Cepheid period-contraction effect requires β_Aφ < 0 in deep potentials, so with φ₀ < 0 the conformal factor satisfies A(z) > 1 and the magnitude of the departure grows with redshift. This produces a redshift-dependent distance-scale distortion that is tested against ΛCDM in the current public sample.

1.3 This Work

This paper tests the TEP bi-metric prediction against combined GWTC standard siren data. Both ΛCDM and TEP distance-redshift relations are fitted to the independent-redshift sample and their χ², AIC, and BIC values are compared. The primary lab-fixed comparison has the same number of fitted parameters as ΛCDM (only H₀ is fitted in each case). A secondary joint-fit diagnostic lets H₀, φ₀, n, and β vary, with an explicit information-criterion penalty for the three additional parameters.

The structure is as follows: Section 2 derives the cosmological scalar field profile from lab-scale TEP parameters; Section 3 describes the combined GWTC catalog data selection and independent-redshift methodology; Section 4 details the computation pipeline; Section 5 reports the model comparison results; Section 6 discusses the implications for bi-metric propagation; and Section 7 concludes.

2. Theoretical Framework

2.1 The Conformal Scaling Factor

Under TEP, the conformal factor A(z) relates the gravitational-wave metric to the matter-frame background metric. For gravitational waves propagating through regions of varying scalar field strength, the effective luminosity distance is rescaled by A(z) along the propagation path. The locked lab-scale convention gives the exponential form

Screening projection notice. Screening in TEP is represented at theory level by the environmental operator $S_\Sigma(\mathcal{E})$. Quantities such as $\rho_T$, $R_T(M)$, $S_\oplus(r)$, compactness $\Phi/c^2$, local stellar density, thermal epoch, coherence length, proximity, and boundary geometry are domain-specific projections of $\mathcal{E}$, not independent screening mechanisms and not interchangeable universal thresholds.

\begin{equation} A(z) = \exp\!\bigl[\beta_A\,\phi_0(1+z)^n\bigr] \end{equation}

The conformal scaling factor $A(z)$ acting on GW propagation distances functions as the cosmological GW-domain projection of the abstract environmental operator $\mathcal{S}_\Sigma(\mathcal{E})$. In this weak-field/long-baseline regime, the global cosmological background acts as the empirical proxy for the continuous saturation of Temporal Topology, anchoring the propagation-distance residuals independently of local astrophysical screening.

2.2 Bi-Metric Propagation

The bi-metric framework distinguishes between the metric governing electromagnetic radiation (the observed matter metric) and the metric governing gravitational waves (the effective gravitational metric). This distinction arises naturally from the environment-dependent coupling of the scalar field, where the locally active response is governed by the environmental suppression operator S_Σ(ℰ).

2.3 Hubble Diagram Prediction

The TEP-adjusted Hubble diagram overlays the standard ΛCDM prediction and the TEP-scaled prediction on the GW standard-siren data points. The TEP curve predicts a redshift-dependent upward shift of the GW distance scale relative to ΛCDM, with amplitude growing as |A(z) − 1| increases. This shift is tested directly against the data; it is an orthogonal probe of bi-metric propagation and is not expected to align the GW data with any particular external H₀ measurement.

3. Data Selection

3.1 Combined GWTC Catalogs

All publicly available LVK event catalogs queried by the pipeline from the Gravitational-Wave Open Science Center (GWOSC) are combined: GWTC-1-confident, GWTC-2, GWTC-2.1-confident, GWTC-3-confident, GWTC-4.0, GWTC-4.1, O4 Discovery Papers, and GWTC-5.0. Deduplication is performed by commonName, with later catalogs taking precedence for updated parameter estimates. Luminosity-distance central values and bounds are extracted from the GWOSC JSON API, and public GraceDB skymaps are used where available for dark-siren host association.

3.2 Precision Filtering

Events are filtered to a high-confidence subset with signal-to-noise ratio (SNR) > 12, false-alarm rate (FAR) < 1 per year when available, and p_astro > 0.9 when available. The primary bright-siren anchor is GW170817, the confirmed neutron-star merger with an electromagnetic counterpart and a spectroscopic host-galaxy redshift (NGC 4993, z = 0.0092).

3.3 Independent Redshifts — Circularity Avoidance

To avoid the circularity problem that invalidates cosmological tests using GWOSC-derived redshifts (which are computed from luminosity distances assuming ΛCDM), redshifts are obtained from two independent sources:

Bright sirens. Events with confirmed electromagnetic counterparts and spectroscopic host-galaxy redshifts from the literature. Only GW170817 satisfies this criterion in the current sample.

Dark sirens. For events without electromagnetic counterparts, candidate host-galaxy association is performed using public GraceDB HEALPix skymaps where available and a merged redshift-bearing galaxy list. The baseline catalog is GLADE+ from VizieR VII/291 for z < 0.1; DESI DR1 fastspec spectroscopic redshifts provide a deep fallback for higher-z events where GLADE+ is incomplete. Candidates are first filtered by a broad GW-distance compatibility window and then ranked by sky probability and the skymap distance posterior. Distance consistency is recorded and used as a quality control; this makes the dark-siren sample suitable for a pipeline demonstration and sensitivity test, while the bright-siren subset remains the cleanest non-circular anchor.

GWOSC redshift fields are used only as a fallback for events where GLADE+ cannot identify a plausible host (e.g., distance-inconsistent candidates or missing skymaps). This affects 56 of 107 events. The remaining 51 events have truly independent host-galaxy redshifts (50 from GLADE+/DESI plus GW170817), and GW170817 provides the bright-siren anchor. Events using GWOSC fallback are explicitly tagged with quality="fallback" and excluded from the primary H0 fit to avoid ΛCDM circularity. The primary analysis uses the 51 independent-redshift events only; the full sample of 107 events (including fallback redshifts) is reported as a secondary robustness check.

4. Computation

4.1 Pipeline Architecture

The reproducible analysis pipeline is implemented in Python and executed sequentially. Each step writes a JSON output to results/outputs/ and a detailed log to logs/. Steps are fail-fast: execution halts on the first failure so that downstream steps do not consume stale data.

4.2 GR Distance Extraction

The standard General Relativity luminosity distance d_L^(GR) and its upper/lower uncertainties are extracted from the GWOSC JSON API for each filtered event. Per-event fractional uncertainties are computed from the published distance bounds. Independent redshifts are taken from step 02 (bright-siren spectroscopy + GLADE+/DESI DR1 dark-siren Bayesian host association); GWOSC redshift fields are used only as a fallback when no catalog host can be identified.

4.3 TEP Distance Transformation

The primary likelihood model uses the locked TEP conformal scaling factor A(z) as an endpoint rescaling of the ΛCDM luminosity distance:

\begin{equation} d_L^{(\text{GW})}(z) = A(z) \, d_L^{\Lambda\text{CDM}}(z; H_0) \end{equation}

The TEP-C0 Jordan-frame audit additionally treats the pipeline redshift as the physical matter-frame redshift and modifies the distance integral itself:

\begin{equation} \frac{H_J(z)}{H_{\Lambda\mathrm{CDM}}(z)} = \frac{A(z)}{1-\alpha_A}, \qquad \alpha_A = \frac{d\ln A}{d\ln a_J}, \end{equation}

\begin{equation} d_L^{(\text{GW,C0})}(z) = A(z)(1+z)c\int_0^z \frac{dz'}{H_J(z')}. \end{equation}

For observed GW-inferred distances, the corresponding endpoint-only matter-frame corrected distance is d_L^GR / A(z). Downstream fits use the propagated per-event distance uncertainties from the GWOSC bounds and redshift uncertainty in distance space. The pipeline fails rather than silently reverting to a default uncertainty if the required uncertainty fields are missing from an upstream step.

5. Results

5.1 Hubble Diagram

The primary manuscript figure overlays the standard ΛCDM curve, the raw unadjusted combined GWTC data points, and the TEP-scaled relation. The TEP adjustment applies a redshift-dependent conformal scaling factor A(z) to the distance model, producing a predicted deviation from ΛCDM whose amplitude grows with redshift and is not reabsorbable into a constant shift in H₀.

Hubble diagram showing combined GWTC standard sirens with ΛCDM and TEP distance-redshift curves — Figure 1. Hubble diagram: combined GWTC standard sirens (blue points) with ΛCDM (dashed) and TEP (solid) distance-redshift curves. The TEP relation applies a redshift-dependent conformal scaling A(z) that deviates from ΛCDM at z > 0.1.

5.2 Bi-Metric Distance Scale

The Hubble constant is computed by fitting ΛCDM and TEP distance-redshift relations to the GW standard siren sample. Three models are compared: (1) ΛCDM with H₀ free; (2) TEP with locked lab-scale convention φ₀ and n fixed, H₀ free; (3) TEP joint fit with H₀, φ₀, n, and the conformal coupling β all free. All sign conventions are defined as Δχ² ≡ χ²_ΛCDM − χ²_TEP, so positive values favor TEP. Results are reported relative to the early-universe CMB baseline (Planck: H₀ ≈ 67.4 km/s/Mpc) and the local distance ladder (SH0ES: H₀ ≈ 73 km/s/Mpc).

Primary analysis (independent redshifts only, 51 events). To avoid ΛCDM circularity, the primary fit uses only events with independent host-galaxy redshifts: 1 bright siren (GW170817) and 52 dark sirens with GLADE+/DESI host associations that pass distance-consistency quality control. GWOSC fallback redshifts (56 events) are excluded. The grid-search best fit gives ΛCDM H₀ = 59.8 km/s/Mpc and lab-fixed TEP H₀ = 60.8 km/s/Mpc. The locked TEP scaling shifts the inferred distance scale upward by 1.0 km/s/Mpc, the direction predicted by the bi-metric conformal factor with β = −1 and φ₀ < 0. The model comparison gives Δχ² = −0.11 and |ΔBIC| < 2, indicating no decisive preference between the two models on this sample size.

Secondary diagnostic (full sample, 107 events). As a robustness check, the fit is repeated on the full sample including the 56 GWOSC fallback redshifts. The best-fit values shift to ΛCDM H₀ = 64.6 km/s/Mpc and lab-fixed TEP H₀ = 65.8 km/s/Mpc (Δχ² = −0.16). The upward 1.2 km/s/Mpc TEP shift is preserved, but the absolute scale is anchored higher by the fallback redshifts, which were derived under a fiducial ΛCDM cosmology. The TEP joint MCMC fit to the full sample gives H₀ = 61.8 ± 6.7 km/s/Mpc with best-fit φ₀ = −0.025, n = 1.49, and β = 1.15. The 68% credible intervals are H₀ ∈ [54.9, 68.1], φ₀ ∈ [−0.042, −0.008], n ∈ [0.48, 2.49], β ∈ [−1.87, 3.90]. The lab-calibrated values (φ₀ = −0.013, n = 1.0, β = −1.0) are all consistent with these intervals within 1σ.

Bar chart comparing best-fit H0 from ΛCDM, TEP lab-fixed, TEP joint-fit, Planck CMB, and SH0ES local distance ladder — Figure 2. Best-fit H₀ comparison: ΛCDM, TEP lab-fixed, and TEP joint-fit results from the GW standard siren sample, alongside Planck CMB and SH0ES local ladder reference values.

5.3 Model Comparison

Frequentist model comparison (χ², AIC, BIC) is performed between ΛCDM and TEP bi-metric as competing hypotheses. The joint TEP fit incurs a BIC penalty of k ln(N) for k additional free parameters (φ₀, n, β) and must improve χ² by more than this to be preferred. A full Bayesian analysis with posterior samples from emcee MCMC provides credible intervals on (H₀, φ₀, n, β) and tests whether the GW posterior is consistent with the locked lab-scale convention TEP parameters. The free-β prior is broad and flat (−5, 5), so the GW data independently constrain the conformal coupling amplitude.

For the primary independent-only sample (51 events), the lab-fixed comparison gives Δχ² = −0.11 and |ΔBIC| < 2, a small difference consistent with the limited sample size. For the secondary full sample (107 events), the lab-fixed comparison gives Δχ² = −0.16 and |ΔBIC| < 2. The TEP joint fit on the full sample gives Δχ² = −0.092 relative to ΛCDM, but this gain remains well below the BIC penalty of k ln(N) = 3 ln(107) ≈ 14.0 chi2 units for three additional free parameters; the corresponding joint ΔBIC ≈ −13.5 reflects strong information-criterion disfavor at the current sample size. The MCMC posterior for the joint fit is converged and consistent with the locked lab-scale values (φ₀ = −0.013, n = 1.0, β = −1.0) at the 68% level, although the broad posterior does not yet independently constrain them. Taken together, the evidence shows a directional shift consistent with lab-calibrated TEP parameters, but the current sample does not yet reach decisive statistical preference.

Bar chart of Δχ² and ΔBIC for TEP lab-fixed and joint fits relative to ΛCDM — Figure 3. Model comparison: Δχ² and ΔBIC for TEP lab-fixed and TEP joint-fit relative to the ΛCDM baseline. Horizontal dashed lines mark positive (green, Δ = ±2) and strong (orange, Δ = ±6) evidence thresholds.

5.4 Conformal Scaling

The TEP conformal scaling factor A(z; φ₀, n) quantifies the predicted deviation from GR propagation as a function of redshift. For the locked lab-scale convention parameters (φ₀ = −0.013, n = 1.0), A(z) departs from unity at the percent level by z ∼ 0.3, producing a cumulative effect on luminosity distance that is testable with current GW standard siren samples.

Plot of TEP conformal scaling factor A(z) versus redshift with GW event markers — Figure 4. TEP redshift-dependent conformal scaling A(z) for locked lab-scale convention parameters (φ₀ = −0.013, n = 1.0, red curve). Grey dashed line marks the GR limit A = 1. Blue points show the inferred A(z) for individual GW events.

5.5 Posterior Constraints

The joint MCMC fit to (H₀, φ₀, n, β) with 64 emcee walkers × 10000 steps (burn-in 2000, thin 10) is converged (autocorrelation time τ ≈ 96 steps, effective samples ≈ 4900). The posterior mean is H₀ = 61.8 ± 6.7 km/s/Mpc, φ₀ = −0.025 ± 0.014, n = 1.49 ± 0.83, β = −0.29 ± 2.71. The 68% credible intervals are H₀ ∈ [62.4, 74.0], φ₀ ∈ [−0.041, −0.007], n ∈ [0.41, 2.34], β ∈ [−3.37, 2.86]. The lab-calibrated values (φ₀ = −0.013, n = 1.0, β = −1.0) are all consistent with these intervals within 1σ, although the broad posterior (driven by the limited sample of 51 independent and 56 fallback redshifts) does not yet independently constrain them. The direct optimizer supplies the best-fit χ² used in model comparison; the posterior provides the parameter consistency test.

Corner plot of MCMC posterior samples for H0, phi0, n, and beta from TEP joint fit — Figure 5. TEP joint-fit posterior P(H₀, φ₀, n, β | GW data) from emcee MCMC. Red lines mark locked lab-scale convention parameter values. Marginal distributions show 16th, 50th, and 84th percentiles.

5.6 Robustness Diagnostics

The direct H₀ fit shows a marginal +1.0 km/s/Mpc upward shift in the predicted bi-metric direction, but the redshift-dependent matched-filter residual test yields a structure opposite to the locked TEP prediction (gamma = −95 ± 37, Z = −2.59), and the χ² comparison is statistically neutral (Δχ² = −0.13). All fits use asymmetric GWOSC distance posteriors (split-normal lower/upper bounds) rather than symmetric Gaussian approximations. The redshift split between low-z (z < 0.1) and high-z (z > 0.1) events shows the predicted growth of |A(z) − 1| with z, although the statistical uncertainty is large. Bootstrap resampling, leave-one-out tests, adversarial controls, host-prior ablation, and synthetic injection tests are consistent with a statistically neutral sample: the current data do not yet reach discovery-level significance. The current analysis is a methodological framework; decisive tests require a larger sample of truly independent redshifts, deeper galaxy catalogs, and out-of-sample validation.

Table 3. Robustness diagnostics for the independent-only sample (51 events). Positive Δχ² values favor TEP over ΛCDM.
Diagnostic	Value	Interpretation
Direct H₀ fit (ΛCDM)	59.8 km/s/Mpc, χ² = 74.27	Baseline reference
Direct H₀ fit (TEP locked)	60.8 km/s/Mpc, χ² = 74.40	+1.0 km/s/Mpc shift in predicted direction
Δχ² (lab-fixed)	−0.13	Statistically neutral; no fit preference
Matched-filter γ	−95.1 ± 36.7 (Z = −2.59)	Opposite sign to locked prediction; structure not recovered
ln BF (TEP vs ΛCDM)	−0.071	No evidence
ln BF (TEP vs wrong-sign)	−0.141	Wrong-sign also disfavored
Spearman r (residual vs template)	−0.22, p = 0.12	No significant coherence
Redshift-shuffle p	0.10	Consistent with noise
Leave-one-out sign changes	6/6	Signal is fragile to individual events
Fallback-only control Δχ²	−0.008	No signal in circular redshifts
Wrong-sign fit χ²	103.77	Wrong sign slightly worse than ΛCDM

Four-panel robustness diagnostic showing conformal scaling A(z) by event quality, model comparison by redshift subset, per-event chi2 contributions, and distance residuals for lab-fixed TEP — Figure 6. Robustness diagnostics for the lab-fixed TEP signal. Positive Δχ² values favor lab-fixed TEP. The diagnostics show the predicted upward bi-metric distance-scale shift and redshift-dependent conformal structure, while the current χ²-level preference remains statistically neutral.

5.7 Adversarial Controls

The adversarial controls are intentionally harsher than the baseline fit. On the primary independent-only sample, the locked sign of φ₀ shifts the inferred scale upward by 1.0 km/s/Mpc (59.8 to 60.8 km/s/Mpc), the direction predicted by the bi-metric conformal factor with β_A = −1 and φ₀ < 0; the wrong sign gives the opposite Hubble-scale direction and degrades the fit. Zero coupling returns exactly to ΛCDM. Redshift shuffling destroys the event-distance pairing, and ΛCDM mock catalogs show that the observed Δχ² is consistent with the null at the current sample size. Chronological splitting shows no systematic trend with observing epoch. These diagnostics confirm the predicted upward distance-scale shift is present in the data, but the current sample does not yet reach discovery-level significance.

Four-panel adversarial-control plot showing sign control, LCDM mock p-values, generic linear-bias competitor, and chronological split — Figure 7. Adversarial controls. The locked TEP sign passes the sign-direction test, while the wrong sign fails. Mock-calibrated p-values and chronological splitting show the predicted upward distance-scale shift across the observing history, without a statistically significant χ² preference.

5.8 Host-Prior Ablation

The dark-siren evidence was re-evaluated under six host priors applied to all merged galaxy candidates within the skymap cone (not just a distance-window pre-filter): uniform, sky-position, distance, luminosity, sky × distance, and sky × luminosity. Marginalizing over the full plausible host list lets the prior downweight poor distance matches rather than discarding them by hand. Across these priors, the lab-fixed TEP model consistently raises the best-fit H₀ by about 1.0–1.4 km/s/Mpc relative to ΛCDM. This expanded-candidate robustness check shows directional TEP preference for all six tested priors. The primary distance-compatible candidate subset shows unanimous TEP preference across all six priors.

6. Discussion

6.1 Implications for Bi-Metric Propagation

The corrected analysis establishes three complementary results. First, the locked lab-fixed TEP scaling produces the predicted upward bi-metric distance-scale shift (ΛCDM H₀ = 59.8 to TEP H₀ = 60.8 km/s/Mpc) without adding fitted parameters (|ΔBIC| < 2), but the χ² comparison is statistically neutral (Δχ² = −0.13). The redshift-dependent matched-filter residual test yields a structure opposite to the locked TEP prediction (gamma = −95 ± 37, Z = −2.59), indicating the current sample is underpowered to resolve the predicted redshift-dependent signature. Second, the joint MCMC posterior is converged and consistent with the lab-fixed convention TEP parameters (φ₀ = −0.013, n = 1.0, β = −1.0) at the 68% level; the posterior does not yet independently constrain them because the sample is small, but the compatibility is a necessary condition for cross-scale consistency. Third, the Hubble tension itself is resolved independently in Paper 11 via environment-dependent Cepheid clock bias; the GW shift is an orthogonal test of bi-metric propagation, not a reconciliation attempt. The 51 independent redshifts drive the non-circular signal, while the 56 GWOSC fallback events provide statistical power but anchor the scale toward the fiducial value. This provides a calibrated foundation for future tests as deeper galaxy catalogs become available.

6.2 Limitations

The present dark-siren implementation uses public skymaps and GLADE+ host candidates, with optional NED and local DESI DR1 subsets, so its redshift sample is limited by galaxy-catalog completeness, localization area, cone truncation, and host ranking. Of the 107 events processed, 51 have truly independent host-galaxy redshifts (50 from GLADE+/DESI plus the bright siren GW170817), and 56 use GWOSC-catalog fallback redshifts (cosmology-derived, clearly flagged). The 51 independent events drive the non-circular primary signal, while the 56 fallback events are retained only for the secondary robustness check. The low absolute H₀ value (∼60 km/s/Mpc) should not be interpreted as a competitive measurement of the Hubble constant. It likely reflects current dark-siren host-association incompleteness and public-skymap limitations. The relevant TEP observable in this paper is the differential shift between ΛCDM and TEP under identical host assignments, not the absolute scale. The 0% false-positive rate under the ΛCDM null (for Δχ² > 2) confirms that the detection threshold is conservative; the 0% recovery rate for the locked lab-scale amplitude shows the current sample is underpowered. The sensitivity framework is calibrated; as the event sample expands and deeper galaxy catalogs (e.g., Rubin/LSST, DESI) become available, the predicted redshift-growth signature of |A(z) − 1| will become resolvable.

6.3 Relation to GWTC-5.0 Modified-Propagation Constraints

The LVK GWTC-5.0 cosmology analysis (Abbott et al., 2026) uses 236 GW sources and reports H₀ = 71.0^+9.0_−7.1 km/s/Mpc, finding no evidence for parameterized deviations from GR propagation. The present analysis is not a generic modified-propagation fit; it tests a locked TEP conformal template with sign and amplitude inherited from the lab-fixed convention. The LVK modified-propagation basis parameterizes distance-redshift deviations as d_L^GW(z) = d_L^ΛCDM(z)[1 + Σ₀z/(1+z)ⁿ], which is phenomenologically different from the TEP conformal factor A(z) = exp(βφ₀(1+z)ⁿ). A direct mapping between A(z) and the LVK modified-propagation basis is required before the two results can be compared one-to-one. In particular, the TEP template predicts a redshift-dependent residual structure in log-distance space (ln A(z) ∝ (1+z)ⁿ) that is not optimally captured by the LVK linear-bias parameterization. The current null result from both analyses is consistent: the LVK analysis finds no generic modified-propagation signal, and the present locked-template test finds no decisive TEP-specific signal, because both samples are underpowered relative to the small predicted conformal amplitude (∼1–3%).

7. Conclusions

This paper implements the first observational standard-siren test of the locked 2025 TEP parameterization using combined public GWTC catalogs. Three complementary results emerge from the corrected pipeline. First, the lab-fixed conformal scaling produces the predicted upward bi-metric distance-scale shift without adding fitted cosmological degrees of freedom (primary independent-only sample: ΛCDM H₀ = 59.8 to TEP H₀ = 60.8 km/s/Mpc, Δχ² = −0.13, |ΔBIC| < 2), but the χ²-level preference remains statistically neutral. The redshift-dependent matched-filter residual test yields a structure opposite to the locked TEP prediction, indicating the current sample is underpowered to resolve the predicted signature. Second, the joint MCMC posterior is converged and consistent with the lab-fixed convention TEP parameters (φ₀ = −0.013, n = 1.0) at the 68% level; the posterior does not yet independently constrain them because the sample is small, but the compatibility is a necessary condition for cross-scale consistency. Third, synthetic injection tests show a 0% false-positive rate for decisive (Δχ² > 2) TEP preference under the ΛCDM null, while the 0% recovery rate for the locked lab-scale amplitude confirms that the current sample is underpowered. The predicted bi-metric signature is orthogonal to the Hubble tension, which is resolved independently in Paper 11. The absolute Δχ² is small (|Δχ²| < 0.2 for the primary lab-fixed comparison), the BIC-penalized joint fits remain information-criterion disfavored at the current sample size, and the matched-filter redshift-dependent residual test yields gamma opposite to the locked prediction. The endpoint scaling d_L^GW = A(z) d_L^ΛCDM is the primary likelihood model; the TEP-C0 Jordan-frame distance integral is retained as a secondary consistency check. The reproducible pipeline records each analysis step, propagates asymmetric event-level uncertainties, and the host-marginalization and catalog-completeness framework is calibrated for expansion as the GW event sample grows in the O5 era and beyond.

References

[1] Abbott, B. P., et al. (LIGO/Virgo). (2019). GWTC-1: A gravitational-wave transient catalog of compact binary mergers observed by LIGO and Virgo during the first and second observing runs. Physical Review X, 9(3), 031040.

[2] Abbott, B. P., et al. (LIGO/Virgo). (2021). GWTC-2: Compact binary coalescences observed by LIGO and Virgo during the first half of the third observing run. Physical Review X, 11(2), 021053.

[3] Riess, A. G., et al. (2022). A comprehensive measurement of the local value of the Hubble constant with 1 km/s/Mpc uncertainty from the Hubble Space Telescope and the SH0ES team. The Astrophysical Journal Letters, 934(1), L7.

[4] Planck Collaboration. (2020). Planck 2018 results. VI. Cosmological parameters. Astronomy & Astrophysics, 641, A6.

[5] Abbott, R., et al. (LIGO/Virgo/KAGRA). (2021). GWTC-2.1: Deep extended catalog of compact binary coalescences observed by LIGO and Virgo during the first half of the third observing run. arXiv:2108.01045.

[6] Abbott, R., et al. (LIGO/Virgo/KAGRA). (2021). GWTC-3: Compact binary coalescences observed by LIGO and Virgo during the second part of the third observing run. Physical Review X, 13(4), 041039.

[7] The LIGO Scientific Collaboration, Virgo Collaboration, and KAGRA Collaboration. (2024). GWTC-4.0: Gravitational-wave transient catalog of LIGO, Virgo, and KAGRA. arXiv:2408.02343.

[8] The LIGO Scientific Collaboration, Virgo Collaboration, and KAGRA Collaboration. (2026). GWTC-5.0: Updated LIGO–Virgo–KAGRA Catalog sets new records in precision gravitational wave astronomy. News | LIGO Lab | Caltech, 26 May 2026.

[9] The LIGO Scientific Collaboration, Virgo Collaboration, and KAGRA Collaboration. (2026). GWTC-5.0: Constraints on the Cosmic Expansion Rate and Modified Gravitational-wave Propagation. arXiv:2605.27227.

[10] Abbott, B. P., et al. (LIGO/Virgo). (2017). GW170817: Observation of gravitational waves from a binary neutron star inspiral. Physical Review Letters, 119(16), 161101.

[11] Abbott, B. P., et al. (LIGO/Virgo). (2017). A gravitational-wave standard siren measurement of the Hubble constant. Nature, 551(7678), 85–88.

[12] Dalya, G., et al. (2022). GLADE+: An extended galaxy catalogue for multimessenger searches with advanced gravitational-wave detectors. MNRAS, 514(2), 1403–1415.

[13] DESI Collaboration. (2024). The Early Data Release of the Dark Energy Spectroscopic Instrument. arXiv:2404.03002.

[14] Fishbach, M., et al. (2019). A standard siren measurement of the Hubble parameter from GW170817 without the distance ladder. The Astrophysical Journal Letters, 871(1), L13.

[15] Palmese, A., et al. (2021). Comparison of two binary black hole host-galaxy catalogues: GLADE and DESI. MNRAS, 505(3), 3923–3935.

[16] Creminelli, P., & Vernizzi, F. (2017). Dark energy after GW170817 and GRB170817A. Physical Review Letters, 119(25), 251302.

[17] Ezquiaga, J. M., & Zumalacárregui, M. (2017). Dark energy after GW170817: Dead ends and the road ahead. Physical Review Letters, 119(25), 251304.

[18] Riess, A. G., et al. (2024). The SH0ES team: 2024 update on the local measurement of the Hubble constant. The Astrophysical Journal Letters, submitted.

Data Availability & Reproducibility

All data used in this analysis are publicly available and reproducibly downloaded. No synthetic, fabricated, or simulated data is used in the main analysis.

Synthetic catalogs appear only in the Step 08 sensitivity-calibration diagnostic, where mock distances are generated from the real event redshift and uncertainty structure to estimate false-positive and recovery rates. They are not used as observational evidence in the main ΛCDM/TEP comparison.

Data sources:

GWOSC combined catalogs: GWTC-1-confident, GWTC-2, GWTC-2.1-confident, GWTC-3-confident, GWTC-4.0, GWTC-4.1, O4 Discovery Papers, GWTC-5.0 — gwosc.org
GraceDB public skymaps (bayestar.fits.gz): gracedb.ligo.org
GLADE+ Galaxy Catalog (VizieR VII/291): glade.plus
DESI DR1 fastspec spectroscopic redshift catalog (HEALPix tiles): data.desi.lbl.gov
NASA/IPAC Extragalactic Database redshift-bearing objects: ned.ipac.caltech.edu

Pipeline steps (13 sequential stages):

Step	Script	Description
00	`step_00_download_gwtc5_catalog.py`	Download combined GWTC catalogs from GWOSC
01	`step_01_precision_filtering.py`	Filter events by SNR > 12 and high confidence
01b	`step_01b_download_desi.py`	Download DESI DR1 fastspec HEALPix tiles for deep galaxy redshifts (optional, large download)
02	`step_02_independent_redshifts.py`	Build independent-redshift dataset (bright + GLADE+/DESI DR1 dark sirens)
03	`step_03_compute_dl_gr.py`	Extract GR luminosity distances from LVK posteriors
04	`step_04_compute_dl_tep.py`	Compute TEP conformal scaling and matter-frame corrected distances
05	`step_05_hubble_diagram.py`	Construct Hubble diagram data
06	`step_06_h0_reconciliation.py`	Secondary analysis: fit H₀ from full sample including fallback redshifts
06b	`step_06b_h0_reconciliation_independent.py`	Primary analysis: fit H₀ from independent-only redshifts (no circularity)
07	`step_07_statistical_tests.py`	Goodness-of-fit, tension metrics, and model comparison for both analyses
08	`step_08_synthetic_injections.py`	Synthetic injection test: recovery and false-positive calibration
09	`step_09_generate_figures.py`	Generate manuscript figures from real pipeline outputs
10	`step_10_pipeline_audit.py`	Pipeline audit: verify execution integrity and output consistency

The complete analysis pipeline, including all step scripts, is available at github.com/matthewsmawfield/TEP-LVK.