BFD Working Group L. Melegassi Internet-Draft Catellix Intended status: Standards Track 22 May 2026 Expires: 23 November 2026 Coherence-BFD: Sub-Second Coherence Detection Using Bidirectional Forwarding Detection Patterns draft-melegassi-coherence-bfd-00 Abstract This document specifies Coherence-BFD, a protocol that combines the asynchronous heartbeat, demand-mode, echo function, and detection-multiplier mechanisms of Bidirectional Forwarding Detection (BFD, [RFC5880]) with the multi-vantage path coherence detection of [I-D.melegassi-mvps-incremental-be]. The result is a sub-second coherence failure detector with theoretical and empirical detection latency of 55 ms (1091x faster than the 60-second tick baseline of the underlying BE-MVPS framework). Five execution variants are specified: V0 (baseline), V1 (heartbeat-fast), V2 (demand), V3 (echo), and V4 (hybrid). Wall- clock benchmarks confirm V3 (Echo) as the latency-optimal variant at 55 ms median tau_detect with 39 680 B/s bandwidth. A new lower bound theorem (Theorem 9 of [I-D.melegassi-mvps-incremental-be]) shows that tau_detect >= M * T_tick + tau_RTT is tight; V3 achieves this bound exactly for M=1. The protocol is designed for deployment alongside conventional BFD sessions: a Coherence-BFD session monitors not the binary up/down state of a forwarding path but the coherence state across N vantage observers. NOTE ON DATA PROVENANCE. All wall-clock detection-latency and bandwidth numbers reported in this document are obtained from synthetic simulations (scripts/benchmark_coherence_bfd.py) under controlled conditions, not from operational deployment data. HARDWARE CAVEAT (v5.0 unified proof, 2026-05-22). The 55 ms median tau_detect figure of Section 5 is a SOFTWARE-HARNESS measurement on a single host, not a router-class measurement. In particular, it does NOT account for the data-plane forwarding asymmetries, micro-bursts, or ASIC/NPU scheduling delays that real BFD hardware exhibits on production forwarders. Operators MUST treat the 55 ms figure as a theoretical-bound demonstration on a coherence-grade observation pipeline (vantages -> broker), not as a guaranteed service level over arbitrary BFD-capable hardware. Validation against real BFD hardware (commercial routers, merchant-silicon ASICs, software forwarders such as VPP or DPDK) is identified as required future work before progression past Experimental status. This caveat is the principal reason the document is targeted at Experimental. A REFERENCE IMPLEMENTATION of the wire format defined in Section 4 (mandatory section + Experimental-range TLVs + HMAC-SHA256 authentication) is provided in pure Python at . It demonstrates end-to-end interop of 1 broker and 4 vantages with 480 packets in 8 seconds, zero HMAC failures, and correct triggering of ALARM and Byzantine-event detection. See reference-impl/README.md for usage. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 23, 2026. Melegassi Expires November 23, 2026 [Page 1] Internet-Draft Coherence-BFD Protocol May 2026 Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction ....................................................2 1.1. The latency gap ...........................................3 1.2. Relationship to BFD .......................................3 1.3. Conventions used in this document .........................4 2. Protocol Overview .............................................4 3. Session State Machine .........................................5 4. Control Packet Format .........................................6 4.1. Mandatory section .........................................6 4.2. Coherence TLV ............................................7 5. Echo Function for Coherence ...................................8 6. Demand Mode and Polling .......................................9 7. Detection Multiplier and Confirmation .........................9 8. Negotiated Intervals .........................................10 9. Five Reference Variants ......................................10 10. Empirical Results (Wall-Clock Measurements) ..................11 11. Lower Bound Achievement ......................................12 12. Security Considerations ......................................12 13. IANA Considerations ..........................................13 14. References ...................................................13 15. Packet Sizing, MTU, and Network Stack Tuning ...............14 15.1. Packet size budget (all packet types) ..................14 15.2. MTU and fragmentation .................................15 15.3. PPS regimes and OS tuning requirements ................15 15.4. Recommended sysctl, ethtool, and queue settings ........16 15.5. NUMA and CPU isolation for the broker ................17 16. Privacy Considerations .......................................18 17. Manageability Considerations .................................19 Acknowledgements ..................................................20 Author's Address .................................................20 1. Introduction Bidirectional Forwarding Detection [RFC5880] provides sub-second failure detection between two endpoints of a forwarding path. Its key mechanisms are: o Asynchronous mode: each endpoint emits Hello packets at a negotiated interval T_tx (typically 16-33 ms). o Detection multiplier M: a session is declared Down only after M consecutive missed Hello packets. o Demand mode: Hello packets can be suspended when not needed. o Echo function: a packet sent by one endpoint, looped back by the other without inspection, used for path verification. The Multi-Vantage Path Synchrony framework [I-D.melegassi-ippm-mvps-bundle] performs anomaly detection across Melegassi Expires November 23, 2026 [Page 2] Internet-Draft Coherence-BFD Protocol May 2026 N vantage observers using the Mahalanobis distance D^2 over a three-axis coherence vector (C_1, C_2, C_3). Its baseline tick period is 60 s, suitable for path-coherence anomalies but unsuitable for sub-second failover. 1.1. The latency gap The two frameworks operate on different time scales: BFD typical: 50 ms detection latency BE-MVPS baseline: 60 000 ms detection latency Gap: 1200x This document closes the gap by adapting BFD mechanisms to drive the BE-MVPS detector. Wall-clock measurements (Section 10) show that the resulting Coherence-BFD protocol achieves 55 ms median detection latency, within 10% of the BFD baseline. 1.2. Relationship to BFD Coherence-BFD differs from BFD in three respects: 1. The monitored state is not binary (up/down) but a coherence distance D^2 in R+. A WATCH threshold and an ALARM threshold are defined per session. 2. The session is N-to-1: N vantage observers report to a single broker, which computes D^2 and disseminates session state. Conventional BFD is 1-to-1. 3. The Echo packet does not measure RTT; it carries a hash of the cell-aggregated coherence sketch and immediately fires an alarm if the aggregate has drifted above threshold/2 in transit. Apart from these differences, the wire format, state machine transitions, and timer negotiation procedures of BFD are preserved. Implementations MAY share code with conventional BFD stacks. 1.3. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals. 2. Protocol Overview Melegassi Expires November 23, 2026 [Page 3] Internet-Draft Coherence-BFD Protocol May 2026 A Coherence-BFD session consists of: o one Broker process, o N >= 2 Vantage processes, o optionally, k <= N Cell-Coordinator processes that aggregate pushes from disjoint subsets of vantages. The session transitions through five states: AdminDown -> Down -> Init -> WATCH -> ALARM AdminDown is the operator-disabled state. Down indicates no session has been established. Init indicates that vantages are sending heartbeats but the broker has not yet received enough to compute D^2. WATCH indicates D^2 has crossed chi^2_{d, 0.95}. ALARM indicates D^2 has crossed chi^2_{d, 0.99}. The Detection Multiplier M controls the number of consecutive above-threshold observations required for state transition. 3. Session State Machine The five-state machine extends BFD's three-state machine (AdminDown / Down / Init+Up). +---------+ +------+ +------+ |AdminDown|<------->| Down |------->| Init | +---------+ +------+ +------+ | v +-------+ +-------+ | ALARM |<-------| WATCH | +-------+ +-------+ | ^ +-------+-------+ v (heartbeat sustained) State transitions: Down -> Init: broker has received heartbeats from >= 2 vantages within T_negotiated_tx. Init -> WATCH: D^2 > chi^2_{d, 0.95} for M consecutive ticks. WATCH -> ALARM: D^2 > chi^2_{d, 0.99} for M consecutive ticks. ALARM -> WATCH: D^2 < chi^2_{d, 0.95} for M consecutive ticks. Melegassi Expires November 23, 2026 [Page 4] Internet-Draft Coherence-BFD Protocol May 2026 WATCH -> Init: no above-threshold observation in 2M ticks. any -> AdminDown: operator action. 4. Control Packet Format The control packet format is a superset of BFD's mandatory section ([RFC5880] Section 4.1). 4.1. Mandatory section 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Vers | Diag |Sta|P|F|C|A|D|M| Detect Mult | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | My Discriminator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Your Discriminator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Desired Min TX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Required Min RX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Required Min Echo RX Interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | D^2 (32-bit float) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Sta: 2-bit state field, encoded: 00 - AdminDown 01 - Init 10 - WATCH 11 - ALARM New flag: C (bit 5): Coherence flag. Set if this packet contains a D^2 value. When clear, the D^2 field MUST be transmitted as zero. The D^2 field is appended immediately after the standard BFD mandatory section. Length increases by 4 octets. Backwards compatibility with [RFC5880] BFD: o A receiver conformant to [RFC5880] but not implementing this document MUST honour the Length field and silently skip the 4-octet D^2 field as opaque trailing data. o A receiver that does not recognise the 'C' flag MUST treat the D^2 field as zero and behave per [RFC5880]. o A sender supporting this document but interacting with an [RFC5880]-only receiver MUST clear the 'C' flag and MUST omit the D^2 field; the Length field MUST reflect the original [RFC5880] mandatory-section length. o Capability discovery proceeds via Type 0x00 (Version-Negotiation) TLV exchanged during session initialisation. 4.2. Coherence TLV Optional sections following the mandatory section. Each TLV is : Melegassi Expires November 23, 2026 [Page 5] Internet-Draft Coherence-BFD Protocol May 2026 Type Name Length Description ---- --------------------------- ------ -------------------- 0x00 Version-Negotiation 3 uint16 supported set 0xE0 Vantage-Sketch 1+d*4 d float values 0xE1 Cell-Centroid 1+d*4 d float values 0xE2 Echo-Hash 34 SHA-256 of cell agg 0xE3 Watch-Threshold 5 float chi^2_{d,0.95} 0xE4 Alarm-Threshold 5 float chi^2_{d,0.99} 0xE5 Vantage-Count-N 5 uint32 N 0xE6 Cell-Count-k 5 uint32 k 0xE7 Phase-Label 2 uint8 phase code 0xE8 Byzantine-Suspect 6 uint32 cell ID + score 0xE9 AuthHMAC-SHA256 34 authentication All TLV type codes used in this document fall in the Experimental range 0xE0-0xEF. Early Allocation [RFC7120] within the IETF BFD Registry will be requested upon Working Group adoption. Until such time, only the Experimental code points above are valid for interoperable implementation. Type 0x00 (Version-Negotiation) is RESERVED. It MUST appear at most once per packet. Its 2-octet value field carries a bitmap of supported Coherence-BFD profile versions; bit n set indicates profile version n is supported. This document defines version 0 only. Receivers encountering an unknown TLV Type MUST skip the TLV using its Length field, MUST NOT discard the packet, and MUST NOT signal an error. When the Echo-Hash TLV is present, the receiver MUST recompute the SHA-256 hash of the cell-aggregated centroid using its currently-cached value and compare with the value in the TLV. A mismatch indicates in-transit corruption or Byzantine modification; the session SHOULD transition to ALARM immediately, bypassing the M-multiplier requirement. 5. Echo Function for Coherence The conventional BFD echo function ([RFC5880] Section 6.4) measures forwarding-plane RTT and verifies that packets sent by the local endpoint are looped back unmodified by the remote endpoint. The Coherence-BFD echo function carries an additional payload: o Echo-Hash TLV (Section 4.2): SHA-256 of the cell-aggregated centroid as observed at the broker at echo transmission time. o Phase-Label TLV: the broker's current Phi_K classification. When the echo packet returns to the broker, the broker MUST verify that: o The Echo-Hash TLV value is unchanged. o The Phase-Label TLV value is unchanged. o The total RTT does not exceed Required Min Echo RX Interval * Detection Multiplier (early-warning timer). A failure of any check transitions the session to ALARM with diagnostic code 0x07 (Echo Function Failed). The Echo function MAY be performed at sub-tick intervals (e.g., every 25 ms even when T_negotiated_tx = 50 ms). This is responsible for the empirical 55 ms median tau_detect (Section 10). 6. Demand Mode and Polling In Demand mode, vantages do not transmit heartbeats unless explicitly polled by the broker. The broker sends a Poll packet (F flag set) when: o D^2 exceeds 0.7 * Watch-Threshold (suspicion threshold), OR o The broker has not received a heartbeat for the current Detection Time, indicating possible network partition. The polled vantage MUST respond within Required Min RX Interval with a Final packet (F flag set) containing the current Vantage- Sketch TLV. Demand mode trades latency for bandwidth: in BAU, no heartbeats are sent (bandwidth approaches zero per vantage), but the first- detection latency increases to one RTT plus T_negotiated_tx. 7. Detection Multiplier and Confirmation The Detection Multiplier M (default 3) controls the number of consecutive above-threshold observations required for state transition. Operators MUST choose M to balance: o False-positive rate: FPR = Pr[D^2 > threshold | BAU]^M. For Gaussian BAU and 95th-percentile threshold, FPR <= 0.05^M; at M = 3, FPR <= 1.25 * 10^-4. o Detection latency: tau_detect >= M * T_negotiated_tx + tau_RTT (Theorem 9 of [I-D.melegassi-mvps-incremental-be]). Echo Function alarms (Section 5) bypass the M-multiplier: a single echo-hash mismatch SHOULD trigger immediate ALARM. 8. Negotiated Intervals Each endpoint advertises its Desired Min TX Interval and Required Min RX Interval in the mandatory section (Section 4.1). The negotiated T_tx is T_negotiated_tx = max(local Desired Min TX Interval, remote Required Min RX Interval). Negotiation occurs at session establishment and MAY be renegotiated after Init -> WATCH transition, allowing operators to dynamically increase fidelity during anomalous periods. 9. Five Reference Variants Implementations MAY default to any of the variants below. The benchmark of Section 10 measures all five. V0 Baseline: T_tx = 60 000 ms, M = 1, no echo. Maps to the BE-MVPS baseline of [I-D.melegassi-mvps-incremental-be]. V1 Heartbeat-Fast: T_tx = 50 ms, M = 3, no echo. Continuous heartbeat; BAU bandwidth high. V2 Demand: T_tx = 1 000 ms, M = 1, demand mode. BAU bandwidth near zero; latency 1 s. V3 Echo: T_tx = 50 ms, M = 1, echo every 2nd tick. Empirically optimal (55 ms). V4 Hybrid: T_tx = 50 ms, M = 3, push + echo + demand. Highest robustness, comparable latency. 10. Empirical Results (Wall-Clock Measurements) Reference benchmark: scripts/benchmark_coherence_bfd.py in the Catellix research repository. N = 1000 vantages, 50 trials per variant, calibrated coherence shock producing D^2 ~ 30 post-shock. Variant tau_detect FPR Bandwidth (median ms) (per 10^4) (B/s) ------------------------- ---------- -------- -------- V0 Baseline 60 005 0 32 V1 Heartbeat-Fast 155 0 118 400 V2 Demand 1 005 0 4 000 V3 Echo 55 0 39 680 V4 Hybrid 155 0 39 680 V3 (Echo) achieves a 1091x latency reduction over V0 baseline at a 1240x bandwidth cost. The latency-bandwidth tradeoff is near- linear; operators may select any variant matching their service level requirements. Compute cost per tick is sub-microsecond (3.8-4.1 us) for all variants on commodity x86 hardware (single core), which is well below the network RTT. Compute is therefore not the bottleneck. 11. Lower Bound Achievement By Theorem 9 of [I-D.melegassi-mvps-incremental-be], the minimum achievable detection latency for any variant with Detection Multiplier M, tick period T_tick, and end-to-end RTT tau_RTT is tau_detect >= max( M * T_tick + tau_RTT, tau_C4 ). Variant V3 with T_tick = 50 ms, M = 1, tau_RTT = 5 ms: tau_detect_min = 1 * 50 + 5 = 55 ms. Empirical measurement: 55 ms median. Bound is tight. No Coherence-BFD variant can achieve faster detection without reducing T_tick further, which costs bandwidth linearly. Implementations targeting sub-50 ms detection MUST adopt T_tick < 50 ms and accept the corresponding bandwidth cost. 12. Security Considerations o Echo packets carrying Coherence TLVs are authentication targets. Operators MUST authenticate Echo and Control packets via the AuthHMAC-SHA256 TLV (Section 4.2, type 0xE9) to prevent Byzantine modification of in-transit aggregates. o Demand mode reduces bandwidth but exposes the protocol to DoS-by-poll-flood from a malicious broker. Implementations MUST rate-limit Poll responses to one per Required Min RX Interval. o Reducing T_tx below 50 ms allows finer detection but increases bandwidth linearly. At T_tx = 1 ms, an N = 10 000 deployment transmits ~5 GB/s in aggregate, which is impractical for software brokers and requires P4-class data-plane offload ([MVPS-DATAPLANE-PROFILE]). o The five-state machine adds two states (WATCH, ALARM) beyond BFD's three (AdminDown/Down/Up). Implementations sharing code with conventional BFD stacks MUST ensure the additional states cannot be confused with Up; conventional consumers of BFD state treating WATCH or ALARM as Up will produce silent failure. o Transport security between vantages, cell coordinators, and the broker. AuthHMAC-SHA256 TLV provides integrity but NOT confidentiality. When the control plane crosses any segment that is not fully under operator control (cross-AS, cross-organisation, multi-tenant cloud underlay), the implementation MUST encapsulate Coherence-BFD packets in DTLS 1.3 [RFC9147] or TLS 1.3 [RFC8446] following the recommendations of BCP 195 [RFC9325]. The TLV format and mandatory section are unchanged; only the transport layer below UDP is replaced. Cipher-suite selection MUST follow BCP 195 Section 4. o Long-term key management for the AuthHMAC-SHA256 TLV. Keys SHOULD be rotated at least every 30 days and MUST be rotated whenever a vantage is decommissioned or a cell-coordinator is re-elected. 12.1. DDoS resilience (the framework as detector, not victim) A frequent misconception is that a high-rate volumetric DDoS against the monitored infrastructure would saturate Coherence-BFD itself. This is incorrect when the deployment respects the following architectural invariants: I1. Vantages and the broker operate on a SEPARATE control plane (out-of-band management VLAN, dedicated NIC, or SDN underlay). User traffic and MVPS telemetry MUST NOT share the same NIC queues on the broker. I2. Vantages OBSERVE the data plane (latency, jitter, loss samples of user traffic) but do not forward user packets. A vantage is a probe, not a middlebox. I3. The broker dimensions its NIC for the legitimate telemetry PPS only (Section 15.3), independent of user- traffic volume. When I1-I3 hold, the DDoS does the opposite of what the operator fears: it produces an instantly observable, geographically localised deformation of the coherence surface, which the M-multiplier confirms within (M-1)*T_tick after onset. Empirical validation (scripts/simulate_ddos_resilience.py, summary in docs/SIM_DDOS_RESULTS.txt): Topology : 10 000 vantages, 8 regions, T_tick = 50 ms Attack : 10 Mpps volumetric DDoS on region 3 (1 250 vantages affected) Coherence shock : cell-wise D^2 jumps from O(1) to >300 Detection latency: 100 ms after onset (M = 3, T_tick = 50 ms) Attribution : R_cross localises to region 3 with 100% argmax accuracy across 275 windows Broker health : 99% availability (single-broker, Regime C tuned per Section 15) Other regions : remain in BAU; D^2 < 5 throughout the attack The detection latency 100 ms equals (M-1)*T_tick = 2*50 ms, matching the lower bound of Theorem 9 within the slack permitted by the M-multiplier confirmation count. 12.2. When the framework IS at risk The honest negative results: o If invariants I1 or I2 are violated (telemetry shares the user-traffic data path), the broker's NIC saturates with attack traffic and the framework degrades to default-deny. This is a deployment defect, not a protocol defect. o Byzantine breakdown: an attacker controlling more than floor((k-1)/2) of the k cells can move the geometric median and minimax aggregator arbitrarily (Theorem 7 of [I-D.melegassi-mvps-incremental-be]). For k = 8 cells, the breakdown bound is 3 compromised cells. o Broker NIC at Regime D (>1 Mpps telemetry, e.g. N = 100k vantages at T_tick = 50 ms) without AF_XDP/DPDK: the kernel stack drops the telemetry itself, producing false ALARM transitions on uncompromised vantages. o Replay of historical Coherence TLVs: mitigated by the BFD sequence numbers in the mandatory section, but requires strictly monotonic implementation; rolling counters MUST NOT wrap within the M*T_tick window. These are bounded, documented failure modes. They are substantially narrower than the failure modes of conventional threshold-based alerting under DDoS, which produces silent degradation across the entire alert pipeline. 13. IANA Considerations This document requests the following IANA actions in the Coherence-BFD Registry (new, created by this document): 1. New protocol code point for the C flag (Section 4.1). 2. Early Allocation [RFC7120] of TLV type codes 0xE0 through 0xE9 as defined in Section 4.2, plus reserved code 0x00 for version negotiation. 3. New diagnostic code 0x07 (Echo Function Failed) in the BFD Diagnostic Registry, if shared with conventional BFD. 4. New phase code points (0x00 AdminDown, 0x01 Down, 0x02 Init, 0x03 WATCH, 0x04 ALARM) in the Coherence-BFD Phase Registry. 14. References 14.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. [RFC5880] Katz, D. and Ward, D., "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017. [I-D.melegassi-mvps-incremental-be] Melegassi, L., "Incremental Bandwidth-Efficient Multi-Vantage Path Synchrony (BE-MVPS): Cell-Partitioned Coherence with epsilon-Gated Sherman-Morrison Updates", Work in Progress, Internet-Draft, draft-melegassi-mvps-incremental-be-00, May 2026. [RFC5706] Harrington, D., "Guidelines for Considering Operations and Management of New Protocols and Protocol Extensions", RFC 5706, DOI 10.17487/RFC5706, November 2009. [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., Morris, J., Hansen, M., and R. Smith, "Privacy Considerations for Internet Protocols", RFC 6973, DOI 10.17487/RFC6973, July 2013. [RFC7120] Cotton, M., "Early IANA Allocation of Standards Track Code Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January 2014. [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018. [RFC9127] Jethanandani, M., Patel, K., Pallagatti, S., and G. Mirsky, "YANG Data Model for Bidirectional Forwarding Detection (BFD)", RFC 9127, DOI 10.17487/RFC9127, October 2021. [RFC9147] Rescorla, E., Tschofenig, H., and N. Modadugu, "The Datagram Transport Layer Security (DTLS) Protocol Version 1.3", RFC 9147, DOI 10.17487/RFC9147, April 2022. [RFC9325] Sheffer, Y., Saint-Andre, P., and T. Fossati, "Recommendations for Secure Use of Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS)", BCP 195, RFC 9325, DOI 10.17487/RFC9325, November 2022. 14.2. Informative References [I-D.melegassi-ippm-mvps-bundle] Melegassi, L., "Multi-Vantage Path Synchrony Bundle", Work in Progress, Internet-Draft, draft-melegassi-ippm-mvps-bundle-00, May 2026. [I-D.melegassi-mvps-ai-coherence] Melegassi, L., "MVPS AI-Coherence Extensions", Work in Progress, Internet-Draft, draft-melegassi-mvps-ai-coherence-00, May 2026. [MVPS-DATAPLANE-PROFILE] Melegassi, L., "MVPS Dataplane Profile", https://catellix.com/static/download/ MVPS_DATAPLANE_PROFILE.txt, 2026. 15. Packet Sizing, MTU, and Network Stack Tuning The protocol is useless on paper if its packets fragment, its broker's NIC saturates, or its IRQ handler stalls. This section addresses three operational concerns that are easy to overlook in the design phase: packet size budget, MTU constraints, and PPS-driven OS tuning thresholds. 15.1. Packet size budget (all packet types) Computed byte-by-byte for IPv4 transport (add +20 octets for IPv6). All Coherence-BFD packets fit comfortably within standard Ethernet MTU 1500. Packet Composition Total ----------------------- ---------------------------- ----- Vantage heartbeat UDP(8) + IP(20) + BFD(24) + hash(4) 56 B Vantage push UDP(8) + IP(20) + BFD(24) + D^2(4) + Sketch TLV(26) + HMAC TLV(34) 116 B Echo packet UDP(8) + IP(20) + BFD(24) + Echo-Hash TLV(34) + Phase-Label(2) + HMAC(34) 122 B Demand Poll / Final UDP(8) + IP(20) + BFD(24) + D^2(4) + Sketch(26) 82 B Cell-Coord -> Broker UDP(8) + IP(20) + k * (id(4)+sketch(26)) + HMAC(34) 82 + 30k (k=10: 382 B) Broker -> Subscriber UDP(8) + IP(20) + BFD(24) + D^2(4) + Phase(2) 58 B All single packets are below 500 octets at k <= 14; below 1500 at k <= 47. Cells SHOULD be sized k <= 100 per coordinator, producing aggregate packets up to 3082 octets, which exceeds MTU 1500. In that regime, the Cell-Coord MUST either: (a) split its centroid report into multiple sub-packets, or (b) use Jumbo frames (MTU 9000) if the underlying L2 supports them. At Jumbo-MTU 9000, a single Cell-Coord packet can carry up to ~300 cells. 15.2. MTU and fragmentation Implementations MUST set IP DF=1 (don't fragment) on all Coherence-BFD packets. An ICMP Fragmentation Needed response indicates an undersized path MTU and MUST trigger: o fallback to MTU 1500 (down from Jumbo), or o cell-split per (a) above. The "Path MTU Black Hole" pathology described in [RFC4821] is particularly damaging here because Coherence-BFD operates with M-multiplier consecutive observations; silently dropped packets manifest as false ALARM transitions. Implementations SHOULD perform PLPMTUD ([RFC4821]) at session establishment and on any AdminDown -> Down transition. For the MVPS bundle envelope of [I-D.melegassi-ippm-mvps-bundle], path snapshots of N >= 30 hops with full ICMP+TTL+timestamp metadata commonly exceed 1500 octets. Coherence-BFD does not carry bundles; bundles are exchanged out-of-band over TCP or chunked over a different control channel. Bundle MTU concerns are out of scope for this document. 15.3. PPS regimes and OS tuning requirements The broker process receives one Vantage packet per tick per vantage. Aggregate packets-per-second (PPS) at the broker: PPS = N / T_tick_seconds. The OS network stack has four well-known performance regimes depending on PPS, summarised below. Operators MUST select OS tuning matching the target regime; failure to do so causes IRQ storm, RX queue overflow, and silent packet drop -- which the Coherence-BFD M-multiplier interprets as anomaly. Regime Target PPS Tuning required ------ ---------------- ----------------------------------- A <= 10 000 Default kernel suffices. Single RX queue acceptable. B 10 000 - 100 000 ethtool coalescing tuned; RSS enabled with N_queues = N_cores; irqbalance daemon active. C 100 000 - 1 M irqbalance disabled, manual IRQ affinity per RX queue; SO_BUSY_POLL enabled; NAPI weight raised; per-queue RFS/aRFS. D > 1 M AF_XDP or DPDK mandatory; kernel network stack bypassed; broker compiled with native zero-copy. Operational examples for typical deployments: Deployment N T_tick PPS Regime -------------------------- ------ ------ -------- ------ Single rack monitor 100 50 ms 2 000 A Single-DC monitor 1 000 50 ms 20 000 B Multi-DC operator 10 000 50 ms 200 000 C HFT / sub-second target 10 000 5 ms 2 000 000 D Hyperscaler (full mesh) 100 000 50 ms 2 000 000 D Implementers targeting Regime C or D MUST consult the data-plane profile [MVPS-DATAPLANE-PROFILE] for hardware-accelerated reference designs. 15.4. Recommended sysctl, ethtool, and queue settings The following are minimum recommended settings for a broker running in Regime B or C on a Linux 5.10+ host. Operators MAY relax for Regime A or tighten for Regime D. o ethtool RX/TX queue sizing: ethtool -G rx 4096 tx 4096 o ethtool coalescing (RX side, reduce IRQ rate): ethtool -C adaptive-rx on rx-usecs 50 rx-frames 64 For Regime C, set adaptive off and tune manually: ethtool -C adaptive-rx off rx-usecs 10 rx-frames 16 o Enable RSS hash on UDP source port (for spreading vantages across queues): ethtool -N rx-flow-hash udp4 sdfn o Enable RPS / RFS for software hashing on single-queue NICs: echo ffff > /sys/class/net//queues/rx-0/rps_cpus echo 32768 > /proc/sys/net/core/rps_sock_flow_entries echo 4096 > /sys/class/net//queues/rx-0/rps_flow_cnt o Increase socket receive buffer: sysctl -w net.core.rmem_default=33554432 sysctl -w net.core.rmem_max=268435456 o Increase UDP receive limits: sysctl -w net.core.netdev_max_backlog=300000 sysctl -w net.core.netdev_budget=600 o For Regime C, enable SO_BUSY_POLL in the broker socket: setsockopt(sk, SOL_SOCKET, SO_BUSY_POLL, &usec, sizeof usec); /* recommended: usec = 50 to 100 */ o Disable irqbalance and pin RX queue IRQs to specific cores: systemctl stop irqbalance systemctl mask irqbalance /* per RX queue n, pin to core n: */ echo > /proc/irq//smp_affinity o TX queueing discipline: replace default pfifo_fast with fq_codel (lower latency under load): tc qdisc replace dev root fq_codel 15.5. NUMA and CPU isolation for the broker At Regime C or above, the broker process MUST be NUMA-pinned to the same socket as the NIC (verify with lspci -vv). Cross-NUMA memory access doubles latency under load. Boot-time CPU isolation: isolcpus=2-7 nohz_full=2-7 rcu_nocbs=2-7 (Linux GRUB cmdline) This removes cores 2-7 from the kernel scheduler; the broker process is then pinned to one of these cores via: taskset -c 2 ./mvps_broker Hugepages reduce TLB pressure for the broker's state arrays (typically 100s of MB at N >= 10 000): sysctl -w vm.nr_hugepages=512 /* broker uses mmap(MAP_HUGETLB) */ For multi-broker deployments, SO_REUSEPORT allows multiple broker threads to share a single UDP listening port with kernel-side load balancing across threads. 16. Privacy Considerations This protocol exposes the geometric coherence state (D^2) of the monitored infrastructure to its operators. While numerical and aggregated, the D^2 value and associated TLVs may enable the following inferences: o Geographic patterns of usage (per-cell D^2 streams may correlate with regional traffic volume). o Topology of customer-facing AS interconnections (visible via Cell-Centroid TLV when broker feeds are shared). o Timing of mitigated attacks (visible via Phase-Label TLV transitions Init->WATCH->ALARM->Init). Implementations: o MUST NOT include payload bytes from observed user traffic in any TLV. Only statistical aggregates derived from operator-internal measurements MAY be carried. o SHOULD aggregate D^2 over windows of at least T_tick before publication to any non-operator audience, to prevent fine-grained timing side-channel inference. o SHOULD redact Vantage-Sketch (0xE0) and Cell-Centroid (0xE1) TLVs in cross-organisation telemetry feeds (e.g., operator-CDN consortium dashboards) and publish only the scalar D^2 field of the mandatory section. o MAY apply differential privacy noise to per-cell D^2 streams before publication to community-defence feeds (analogous to MISP or AbuseIPDB). The privacy considerations framework of [RFC6973] applies. This document does not introduce categories of personally identifiable information. 17. Manageability Considerations This section is REQUIRED by [RFC5706] for Routing Area documents. Operations. The five-state machine is observable via standard BFD management interfaces extended for the WATCH and ALARM states. A YANG augmentation of [RFC9127] is anticipated as a future companion document. Faults. Persistent ALARM without corresponding data-plane outage, and persistent WATCH oscillation, both indicate calibration drift (Sigma_0 has aged). Implementations SHOULD expose a "recalibrate" administrative action that re-derives mu_0 and Sigma_0 from the last N ticks of BAU samples. Recommended N for production deployments: at least 86 400 ticks (24 h at T_tick = 1 s, or 24 min at T_tick = 1 ms). Calibration procedure. Initial calibration of (mu_0, Sigma_0): 1. Bring all vantages online and let the session reach the Up state (per [RFC5880]). 2. Collect at least 600 ticks (30 s at T_tick = 50 ms) of confirmed BAU samples; reject any tick during which D^2 > 3 * IQR (interquartile range) of the current window. 3. Set mu_0 = sample mean of cell centroids over the collected BAU window. 4. Set Sigma_0 = sample covariance + epsilon * I, where epsilon = 1e-6 to ensure invertibility. Implementations SHOULD support online recalibration triggered either by operator command or automatically after detected topology change (BGP UPDATE batch above per-AS threshold, anycast catchment change). Configuration. All timer parameters (T_tick, M-multiplier, Desired Min TX Interval, Required Min RX Interval) follow [RFC5880] conventions. Additional Coherence-BFD parameters: o cell_count_k (default: 8) o byzantine_bound_B (default: floor((k-1)/2)) o watch_threshold (default: chi^2_{d, 0.95}) o alarm_threshold (default: chi^2_{d, 0.99}) o dual_mode_aggregation (default: enabled, per [I-D.melegassi-mvps-ddos-resilience] Section 7.2) Performance metrics. Implementations SHOULD expose: o detection_latency_p50, p95, p99 (over rolling 24 h) o false_positive_rate_1h o byzantine_alarm_count_24h o cells_above_watch_threshold (gauge) o vantages_in_session_up (gauge) Acknowledgements The authors thank early reviewers of the MVPS framework, whose informal questions during May 2026 shaped this document. In particular, the question "if MTU, IRQ, and queue tuning are not handled, does this break under real traffic?" directly motivated the addition of Section 15 (Packet Sizing, MTU, and Network Stack Tuning). The authors thank the IETF BFD WG mailing list for the conventions and registry structure that this document follows. Author's Address Leonardo Melegassi Catellix Andradina, SP Brazil Email: melegassi@catellix.com URI: https://catellix.com/ Melegassi Expires November 23, 2026 [Page 14]