Building the Data Infrastructure for Distributed Energy Resource Management

November 24, 2025

Managing millions of DER endpoints requires a different architecture than traditional SCADA. Here's how to build it on AWS.

Recent Updates

Building the Data Infrastructure for Distributed Energy Resource Management

The operational model that has governed utility grid management for a century is being fundamentally disrupted. Traditional grid operations assumed a small number of large, centrally dispatched generation assets and a passive load side that utilities could forecast but not directly control. That model no longer describes the grid.

A mid-sized distribution utility today may have hundreds of thousands of rooftop solar installations, tens of thousands of residential battery systems, and a growing fleet of managed EV chargers within its service territory. Collectively, these distributed energy resources (DER) can swing net load by hundreds of megawatts within a single service territory — more than many peaker plants. Managing them as a coherent operational resource requires data infrastructure that traditional SCADA and historian systems were never designed to provide.

This post covers the architectural patterns we use to build DER management data infrastructure on AWS: device connectivity at scale, real-time telemetry ingestion, asset modeling, and the ML foundation needed for net load forecasting and dispatch optimization.

Why Traditional SCADA Doesn’t Scale for DER

SCADA systems are exceptionally well suited for what they were designed to do: monitor and control a bounded set of large, high-value assets (substations, transmission lines, large generators) with deterministic communication protocols (DNP3, Modbus, IEC 61850) over dedicated private networks.

DER management is a different problem in almost every dimension. The device count is orders of magnitude higher — a utility with 200,000 smart meters, 50,000 rooftop solar interconnections, and 10,000 battery storage systems has a population of managed endpoints that no traditional SCADA historian can cost-effectively serve. The communication path is the public internet rather than a private OT network — residential DER communicate over cellular or home broadband. The protocols are different: IEEE 2030.5 (SEP 2.0) governs utility-to-DER communication for residential devices; OpenADR 2.0 is the standard for commercial and industrial demand response.

The data model is also different. A traditional SCADA historian tracks a relatively stable set of tags at high frequency (seconds to minutes). DER telemetry is often lower frequency (5–15 minute intervals) but arrives from a much larger and more dynamic population of devices, with enrollment and unenrollment happening continuously as customers add or remove equipment.

Device Connectivity at Scale

AWS IoT Core is the right foundation for DER device connectivity. It handles millions of concurrent MQTT connections, manages X.509 device certificates for authentication, and routes messages to processing pipelines without requiring any always-on server infrastructure on the utility side.

The connectivity architecture we deploy:

Device certificate management. Each DER endpoint receives a unique X.509 client certificate at enrollment. IoT Core validates the certificate on connection, and the thing registry maintains the mapping between certificate identity and asset metadata (device type, location, program enrollment, rated capacity). Certificate rotation and revocation are managed through the IoT Core certificate management APIs.

Topic structure. IoT Core topic routing rules direct telemetry messages to downstream processors based on topic hierarchy. We organize topics by device class and registration area: der/{utility_id}/{device_class}/{device_id}/telemetry. This structure allows topic-based routing rules to send solar inverter telemetry to a different processing path than battery state-of-charge data without requiring downstream services to filter.

Protocol translation. Not all DER communicate directly with IoT Core. IEEE 2030.5 requires a server-side aggregation point (DCM — Device Capability Manager) that translates the REST-based 2030.5 protocol to MQTT for IoT Core. We deploy this translation layer as a containerized service on ECS, sitting at the utility’s DMZ boundary.

Real-Time Telemetry Ingestion

Telemetry from IoT Core flows to Kinesis Data Streams for real-time processing. The Kinesis partition key is derived from the device’s grid location (distribution circuit or substation feed) rather than the device ID — this keeps all telemetry from devices on the same circuit in the same shard, which matters for the per-circuit aggregations that grid operations needs.

From Kinesis, two processing paths run in parallel:

Hot path (real-time aggregation). A Kinesis Data Analytics (Apache Flink) application performs windowed aggregations over the stream: 5-minute and 15-minute sums of active power output by circuit, device class, and program. These aggregated metrics are written to a DynamoDB table that serves as the operational state store — the source of truth for current DER output that the DERMS and grid operators query. DynamoDB’s single-digit millisecond read latency makes it appropriate for operational dashboards and dispatch optimization algorithms that need current state.

Cold path (data lake). The raw telemetry stream is written to S3 via Kinesis Data Firehose, Parquet-formatted and partitioned by date and circuit. This becomes the training dataset for forecasting models and the historical record for settlement and audit purposes. A Glue crawler maintains the schema catalog; Athena queries let operations staff run ad hoc analysis without needing a dedicated analytics database.

Asset Modeling with AWS IoT SiteWise

Individual device telemetry is necessary but not sufficient for grid operations. Grid operators need to understand DER capacity and output in terms of grid topology — not individual devices, but circuits, feeders, and substations. AWS IoT SiteWise provides the asset hierarchy model that bridges device-level telemetry and grid-level operations.

The SiteWise asset model mirrors the distribution system hierarchy: each distribution circuit is modeled as an asset with child assets representing the DER enrolled in demand response or VPP programs on that circuit. SiteWise’s asset properties compute circuit-level metrics (total enrolled capacity, current aggregate output, available curtailment headroom) as roll-ups from device-level measurements. These computed properties are accessible via the SiteWise API and can be pushed to IoT Core topics for downstream consumers.

This hierarchy also serves the dispatch use case. When the grid operator needs to dispatch a demand response event — reduce load in a specific substation zone by a target amount — the SiteWise model provides the capacity visibility to identify which enrolled devices to target and in what sequence.

Net Load Forecasting

The fundamental forecasting challenge in a high-DER environment is net load: gross load minus behind-the-meter generation. On a clear day in a service territory with high solar penetration, net load mid-day can be substantially lower than gross load — and the ramp rate in late afternoon, as solar generation drops while load remains elevated, can be very steep. Forecasting this profile accurately is critical for dispatch scheduling and market commitments.

The net load forecasting model we build for DER-heavy territories is structured as two stacked components: a gross load forecast (weather-driven, well-understood problem) and a behind-the-meter generation forecast (solar production estimate using irradiance forecast and enrolled device capacity data from SiteWise). Net load is the residual.

The behind-the-meter generation forecast is the harder problem. It requires knowing:

SageMaker Pipelines orchestrates daily retraining of both components. The feature store (SageMaker Feature Store) holds the pre-computed circuit-level DER availability features derived from the operational state store, making them accessible to training jobs without requiring a separate extraction step.

What This Enables

The data infrastructure described here is the foundation for the operational capabilities that matter for DER program management: real-time visibility into aggregate DER output by grid location, accurate net load forecasting for day-ahead market commitments, and the dispatch optimization that makes virtual power plant programs economically viable.

It also creates a feedback loop that improves over time. Every dispatch event — the utility sends a curtailment signal to enrolled devices and measures the actual demand response — becomes training data that improves both the availability model (which devices actually respond to dispatch signals) and the dispatch optimization (which selection of devices to target produces the most reliable response). The infrastructure that handles telemetry also handles the outcome data from dispatch events, closing the loop automatically.


If your organization is working through the data infrastructure requirements for a DER management program, we’re happy to talk through the architecture.

Insight Authors

Looking for a partner with engineering prowess?

Learn how we've helped companies like yours.