Skip to main content
Infrastructure

The Substation Network

The previous post ended with a table: six asset classes, six different replacement cadences, all bolted to the same site. The problem is clear — the secondary systems need to refresh on a software cadence while the primary plant carries on through its decades-long lifecycle. The question is how.

The answer has been building since the early 2000s, and it starts with a small device bolted to an existing instrument transformer out in the yard.

The bridge between two timescales

The enabling technology for the retrofit is the stand-alone merging unit — SAMU, specified in IEC 61869-13. A SAMU sits at the existing CT and VT terminals out in the yard, samples the analogue waveforms at 4 kHz, and publishes the result onto a fibre-optic process bus as IEC 61850-9-2 Sampled Values. The current and voltage transformers themselves don’t need replacing. The legacy copper-trunked control cables don’t need digging up in one hit. The new digital protection IEDs subscribe to the SAMU’s sample stream and make their decisions in software.

That is the bridge between the two timescales. A transformer commissioned in 1975 can feed a 2025-vintage protection scheme through a SAMU mounted in its marshalling kiosk. The primary plant doesn’t notice the upgrade. The control room sees a digital, deterministic, network-attached protection system where there used to be racks of fixed-function relays.

Three reasons that bridge matters now more than five years ago. The first generation of numerical relays — installed from the late 1990s onward — is dropping out of vendor firmware support, so utilities have to do something with those panels in the next decade. The SAMU and process-bus standards are mature enough to drop into existing yards with interoperable products from multiple vendors. And the compute layer has crossed the price/performance threshold where a hypervisor in a substation outhouse can host the protection functions previously distributed across a wall of relays.

Refresh-when-the-iron-fails was the only viable path when secondary systems had to be procured as integrated vendor packages. Retrofit-on-a-software-cadence is the path that opens up when the secondary system is an Ethernet network, a process-bus stream, and a piece of code.

Three levels, two networks

IEC 61850 organises a digital substation into three logical levels — station, bay, and process — and two logical buses between them.

The station bus carries MMS client/server traffic and GOOSE messages. It’s the slower, more conversational LAN — “breaker 3 is now open”, “please close breaker 7”, reports to SCADA.

The process bus carries Sampled Values and trip GOOSE. This is the hard-real-time LAN. Voltage and current waveforms stream as digital samples at 4 kHz, and a protection IED subscribes to them the way a microservice subscribes to a message bus.

The merging unit introduced above sits at the process level. The detail worth adding here is the sample geometry: 80 samples per AC cycle, which on a 50 Hz grid is 4 000 Hz. IEDs on the bay level subscribe to those Sampled Values frames, reconstruct the waveform in software, and make their protection decisions.

Once you’ve digitised the process-level measurements, the copper trunks between yard and control room can be replaced by a fibre-optic process bus. That’s been the pitch since about 2004. Adoption has been slow. There are reasons.

Why the network itself is a protection device

This is the part that felt alien at first.

In a cloud context, if the network blips, the application retries. Worst case, you drop a request. In a substation, if the process bus goes down while a fault is developing, the protection relay can’t see the fault and the breaker doesn’t open. The network is part of the safety system.

Which explains the mechanisms that got invented on top of Ethernet to make it fit for this job.

PRP (Parallel Redundancy Protocol) and HSR (High-availability Seamless Redundancy), both defined in IEC 62439-3, achieve zero-recovery-time redundancy by sending every frame twice. PRP does it over two parallel LANs; HSR does it around a ring. The receiver keeps whichever copy arrives first. No spanning-tree reconvergence, no packet loss, no “we’re back up in four seconds.” You lose a LAN, the other LAN keeps carrying traffic, and nothing on top of the network notices.

PTP (Precision Time Protocol), specifically the IEC/IEEE 61850-9-3 Power Utility Profile, gets the clocks on every merging unit synchronised to better than one microsecond over Ethernet. Why microseconds? Because differential protection — “compare the current flowing into this line with the current flowing out, trip if they disagree” — relies on timestamping samples from both ends consistently. A microsecond of clock drift becomes a phase-angle error, which becomes either a nuisance trip or a missed fault.

If you’ve spent time tuning NTP in a data centre and thought a few milliseconds was fine, PTP in a substation is a different animal. Transparent-clock switches, boundary clocks, PTP-aware NICs — a whole stack of hardware that exists because software NTP over consumer switches isn’t remotely good enough.

The WAN above the LAN

The substation LAN — the process bus and station bus — is where the hard-real-time action happens. But a substation is not an island. Every site connects back to a control centre, and the network that carries that traffic has a physical form that’s easy to miss if you’ve only ever thought about grid infrastructure in terms of copper and iron.

Operational telecoms is the dedicated telecommunications network that transmission operators build and maintain to carry their own operational traffic. Its backbone is typically OPGW (Optical Ground Wire) — a composite cable that doubles as the lightning earth conductor strung along the top of transmission pylons and as a fibre-optic data carrier. The fibres sit inside the steel-and-aluminium sheath that protects the power conductors below from lightning, so every pylon route is also a fibre route. SCADA telemetry, inter-substation protection signalling, and operational voice all ride on the operational telecoms network. Owning the transport gives the operator control over latency and bandwidth — and physical security that is hard to replicate on leased circuits: tapping a fibre inside the earth wire of a 400 kV pylon is not a casual exercise.

That physical security is part of why some operators have historically run SCADA protocols in plain text — the network itself was the security control. It is also why the plain-text argument is weakening: the operational telecoms network increasingly interconnects with IP-routed corporate and cloud networks at the control-centre boundary, and the assumption of a closed physical perimeter no longer holds end to end.

Sitting alongside the SCADA telemetry on the same WAN is wide-area measurement: Phasor Measurement Units (PMUs) producing IEEE C37.118 / IEC 60255-118-1 synchrophasor streams — GPS-timestamped voltage and current phasors reported at 50 or 60 frames per second to a control-centre Phasor Data Concentrator (PDC). The reporting rate is far lower than process-bus Sampled Values (which run at 4 kHz for local protection), because the use case is different: wide-area situational awareness, oscillation detection, and post-event reconstruction rather than sub-cycle tripping. It was PMU data that let Ofgem reconstruct the 9 August 2019 UK blackout in detail that 30-second SCADA polling could never have provided.

The data substrate for that kind of reconstruction — the Wide-Area Monitoring System (WAMS) — depends on the same operational telecoms transport, the same PTP time reference, and the same assumption that the network underneath is trustworthy.

The centralised dream

Once everything is on a digital process bus, a tempting question presents itself: do we still need one protection relay per bay?

ABB’s answer is the SSC600, a device about the size of a DVD player that consolidates the protection logic of thirty distribution-class relays into a single unit at the station level. Feed it merging-unit streams via the process bus, let it subscribe to every bay, and it makes all the protection decisions centrally. Their software release, SSC600 SW, unbundles the logic from ABB hardware so it can run on whichever server platform the utility chooses.

Siemens, GE, and others have their own flavours. There’s a cross-vendor group called the vPAC Alliance working on a shared substrate for virtual protection, automation, and control. CIGRE — the international grid industry’s standards-and-research body — has a working group, B5.84, writing the reference architecture for virtual IEDs in protection, automation and control systems. (B5.70, the one that gets quoted online, is the broader reliability-and-architecture group; B5.84 is the one doing the virtual-IED work specifically.)

This is the bit where my infrastructure instincts start to feel useful again. Containerised workload on a redundant host, shared network, lifecycle managed by software: it’s the same playbook I’ve been running on VMware for over twenty years. The difference is the workload has a three-millisecond deadline and opening the wrong breaker is a national news event.

Three camps and an outlier

Once you go looking for who’s actually building this, the answer fragments into four groups that quietly disagree about almost everything.

The vPAC Alliance is the centrist position: a vendor consortium — ABB, Schneider, Siemens Energy, GE Vernova on one side; Dell, Intel, Advantech, Broadcom (VMware) on the other; AEP and Southern California Edison as utility sponsors — standardising virtualised protection on commodity x86 with whatever hypervisor the utility wants to buy. In practice that has meant a lot of vSphere demonstrations at industry conferences. It is recognisable infrastructure work. It is also, mostly, demos.

The actual production reference is in France. RTE, the French TSO, has been running its first virtualised 63 kV substation continuously since late 2023 on the open-source SEAPATH stack — real-time Linux with KVM, Ceph for shared storage, Pacemaker for HA, Ansible for orchestration. SEAPATH lives under the Linux Foundation Energy umbrella and shipped v1.0 in February 2025 with 700-plus daily latency tests in CI. RTE plans 100 substations on it by 2030.

The third camp is European, DSO-led, and below the noise floor for most people in transmission. The E4S AllianceEdge for Smart Secondary Substations, formally incorporated as a Brussels non-profit in November 2025 — is doing the analogous work for the small distribution-transformer stations that sit between the primary substation and the street. Different latency budget, different members (Iberdrola, Enedis, E-REDES, ABB, Schneider, Capgemini, Intel), different problem (LV management, DER integration, edge metering). Worth knowing exists; not the same fight as vPAC.

And then there is SEL, who sells more protection relays than almost anyone and is conspicuously not in the vPAC Alliance. SEL’s position, articulated in technical papers like Resetting Protection System Complexity, is that protection should stay on dedicated hardened devices with field MTBF measured in centuries — not on shared general-purpose compute. SEL ships its own software stack — Blueframe, a hardened Linux application platform for management, analytics and DMS — and its own deterministic Ethernet fabric — SEL SDN, an OpenFlow-based switching system with sub-100 microsecond failover via pre-engineered flows. What SEL emphatically does not sell is a virtualised protection relay. That is a position, not an oversight.

You can read this two ways. Either SEL is the engineering conscience of an industry that’s about to learn the hard way why determinism matters, or SEL is the established vendor whose business depends on not virtualising the thing it sells. I genuinely don’t know which.

The virtualisation constraints nobody warns you about

The centralised, virtualised approach doesn’t stop at protection. Remote Terminal Units — the boxes that collect SCADA telemetry from substations and forward it up to control-centre systems — are moving the same way. An RTU was historically a physical appliance sitting inside a substation or switching centre. Utilities are increasingly running them as virtual machines alongside the control-centre EMS and ADMS platforms, with the physical site reduced to IEDs, merging units, and a WAN gateway.

Same engineering challenges as virtualised P&C. Determinism under general-purpose hypervisors. PTP across virtual switches — harder than it sounds, because vSwitches are not naturally PTP-transparent and the IEEE 1588 clock hierarchy assumes hardware support along the path. Network isolation between the virtualisation control plane and the grid signals that actually matter.

And the constraint that catches most cloud-thinking up short: the host hardware itself has to pass IEC 61850-3 and IEEE 1613 environmental and EMC testing if it’s actually going to live in the substation rather than the control centre. -40 °C to +85 °C, no fans, conformal coating, surge withstand at 4 kV. A generic 1U blade is not deployable in a 33 kV switching cabinet. The vPAC and SEAPATH camps have spent significant engineering effort on substation-grade x86 hardware specifically for this reason.

Respect for what’s already running

The temptation, coming from cloud, is to look at this stack and see everything that could be replaced. Plain-text protocol? Encrypt it. Bare-metal EMS? Virtualise it. Serial-era RTU protocols on legacy feeders? Rip them out.

I’ve learned to temper that instinct. The reason these systems still look like they do isn’t because the power industry hasn’t noticed Ethernet. It’s because the consequences of getting it wrong are enormous, the asset lifetimes are measured in decades rather than release cycles, and every change has to be validated against a safety case that exists whether or not any given engineer likes the documentation culture.

The quiet transformation happening inside substations — digital process buses, virtualised protection functions, centralised P&C consolidating a rack’s worth of relays into a single unit — is the right kind of change. It’s incremental, it’s backwards-compatible via stand-alone merging units bolted onto existing instrument transformers, and it’s being done with one eye on the standards and the other on the grid’s ability to tolerate a mistake.

Three milliseconds to trip a 400 kV breaker. That deadline isn’t going anywhere. What’s changing is where, and on what kind of machine, the software that meets the deadline now runs.

That’s the interesting part. It’s also the part that makes the security question unavoidable — because once the protection system is an Ethernet network, a process-bus stream, and a piece of code, it inherits all the security problems that Ethernet networks, process-bus streams, and code have always had. The next post is about those problems.

References

Standards

Asset lifetimes and refurbishment

  • CIGRE Technical Brochure 760 — Stand-alone merging units for IEC 61850-9-2 process bus (2019)

GOOSE performance

Centralised and virtualised protection

Industry consortia and open-source platforms