Skip to main content
Back to Blog
20 November 202418 min read

Smart Metering at Scale: Data Architecture for 2.5M+ Customers

EnergyData ArchitectureBig DataAnalytics

How we built a data platform to process smart meter readings for millions of energy customers at Vattenfall. Time-series data, aggregation strategies, and analytics pipelines.


Smart Metering at Scale: Data Architecture for 2.5M+ Customers

At Vattenfall, we built a data platform to capture, aggregate, and analyze smart meter readings for over 2.5 million energy customers. This system enabled predictive usage analysis, accurate billing estimations, and real-time operational dashboards.

The Smart Metering Challenge

Smart meters generate data at unprecedented scale compared to traditional monthly meter readings:

Data Volume Analysis

Per meter, per day:

  • 96 readings (15-minute intervals)
  • Multiple data points per reading (consumption, voltage, power factor)
  • Metadata (meter status, communication quality)

For 2.5 million meters:

  • 240 million readings per day
  • 87.6 billion readings per year
  • Multi-year retention requirements for billing disputes and analysis

Business Requirements

Billing accuracy: Meter data must be complete and validated before billing cycles Customer portals: Real-time usage visibility for 2.5M+ registered customers Predictive analytics: Usage forecasting for capacity planning and customer engagement Regulatory compliance: Data retention, audit trails, and reporting requirements

Data Architecture Overview

Our architecture separated concerns across specialized data stores:

Ingestion Layer

Apache Kafka served as the central nervous system:

  • Received raw meter data from collection systems
  • Buffered during downstream outages
  • Enabled multiple consumers with different processing needs
  • Provided replay capability for reprocessing

Raw Data Storage

Apache Cassandra stored raw meter readings:

  • Optimized for time-series write patterns
  • Linear scalability for growing meter population
  • Tunable consistency (eventual for raw data)
  • Time-based data expiration (TTL)

Aggregated Data Storage

PostgreSQL housed aggregated and validated data:

  • Daily, weekly, monthly rollups
  • Complex queries for billing and reporting
  • ACID compliance for financial calculations
  • Integration with existing business systems

Analytics Layer

ClickHouse powered analytics and dashboards:

  • Columnar storage for analytical queries
  • Real-time aggregations across dimensions
  • Sub-second response for complex queries
  • Efficient compression for historical data

Ingestion Pipeline Deep Dive

Getting data from meters to storage involved multiple processing stages:

Stage 1: Collection

Meters communicate via various protocols (DLMS/COSEM, PRIME, OSGP). Collection systems normalize these into a common format before publishing to Kafka.

Stage 2: Validation

Before storage, every reading passed through validation:

Technical validation:

  • Timestamp within expected range
  • Values within physical limits (no negative consumption)
  • No gaps in sequence numbers

Business validation:

  • Consumption within historical bounds (detect meter tampering)
  • Meter registered and active in customer database
  • Communication quality above threshold

Stage 3: Enrichment

Raw readings were enriched with:

  • Customer account information
  • Tariff structure for cost calculation
  • Geographic data for regional analysis
  • Historical baseline for comparison

Stage 4: Storage

Validated, enriched data flows to multiple destinations:

  • Cassandra for raw storage
  • Kafka topics for downstream consumers
  • Direct path to real-time dashboards

Aggregation Strategy

Raw data alone doesn't serve business needs. Aggregation makes data useful.

Time-Based Rollups

Hourly aggregates: Sum of 15-minute readings, computed in near-real-time Daily aggregates: Computed overnight, validated before customer visibility Monthly aggregates: Official billing data, reconciled with customer accounts

Dimension-Based Aggregates

By geography: Regional consumption for capacity planning By customer segment: Residential vs. commercial patterns By tariff type: Usage patterns across pricing structures

Aggregation Implementation

We used two approaches:

Real-time aggregation: Kafka Streams computed running totals for dashboards Batch aggregation: Scheduled Spark jobs computed validated aggregates for billing

The key insight: real-time aggregates are approximate; batch aggregates are authoritative. Customers see real-time data with a "provisional" label until batch validation completes.

Handling Data Quality Issues

Smart metering data is messy. Our pipeline handled common issues:

Missing Data

Meters go offline. Communication fails. Data gaps are inevitable.

Detection: Hourly jobs identified missing readings Estimation: Interpolation from adjacent readings or historical patterns Flagging: Estimated data marked separately from actual readings Remediation: Backfill when communication restored

Late-Arriving Data

Data sometimes arrived days after the reading timestamp.

Handling: Accepted late data up to configurable threshold Reprocessing: Triggered aggregate recalculation for affected periods Notification: Alerted billing systems if late data affected invoiced periods

Incorrect Data

Faulty meters, data corruption, and human error caused incorrect readings.

Manual corrections: Workflow for customer service to adjust readings Audit trail: Complete history of changes with reasons Downstream updates: Automated propagation to affected aggregates

Performance Optimization

Scale demanded careful optimization at every layer.

Cassandra Optimization

Partition design: Time-bucketed partitions (meter_id + day) Compaction strategy: TimeWindowCompactionStrategy for time-series Read path: Bloom filters and partition key caching Write path: Batched writes, tuned memtable settings

Query Optimization

Pre-aggregation: Most common queries served from pre-computed tables Materialized views: ClickHouse materialized views for dashboard queries Caching: Redis cache for frequently accessed customer data Query routing: Separate read replicas for reporting workloads

Resource Management

Data tiering: Hot data on SSD, warm data on HDD, cold data in object storage Auto-scaling: Kafka consumers scaled based on lag metrics Cost optimization: Regular review of data retention and compression

Lessons Learned

1. Time-Series Databases Have Trade-offs

Cassandra excelled at writes but struggled with ad-hoc queries. ClickHouse excelled at analytics but wasn't designed for point queries. The hybrid approach served different access patterns optimally.

2. Aggregation is Key to Query Performance

Nobody queries 87.6 billion rows. Pre-aggregated data at multiple granularities enabled interactive dashboards and reports.

3. Data Quality Pipelines Are Essential

Garbage in, garbage out. Investing in validation, estimation, and correction workflows paid dividends in billing accuracy and customer trust.

4. Plan for Data Corrections and Reprocessing

Requirements change. Bugs happen. Design systems that can reprocess historical data without downtime.

5. Monitoring Is Non-Negotiable

With billions of readings, problems hide in the noise. Comprehensive monitoring caught issues before they impacted customers.

Results Achieved

After implementation:

  • Sub-second query response for customer portal usage displays
  • 99.8% data completeness through validation and estimation pipelines
  • Predictive accuracy within 5% for usage forecasting
  • Billing disputes reduced 40% through improved data quality
  • Analytics dashboards used daily by operations and customer service

Share this article