Anomaly/Outlier Detection in Energy Demand
20 Anomaly/Outlier Detection in Energy Demand
20.1 Rationale & Link to BEYOND Apps
Since abnormal power consumption behavior results in higher electricity consumption and wasted energy, identifying abnormal power consumption behavior of users from the obtained energy consumption data allows for more efficient use of electricity. There are many reasons for abnormal electricity consumption by end-users, for example, damage to equipment, wasteful behavior by end-users (forgetting to switch off the equipment after use or using incorrectly configured equipment), electricity theft attacks. The method of finding patterns in the data that do not conform to expected or normal behavior is generally referred to as anomaly detection. Identifying abnormal energy consumption increases by specific end users should be seen as a way of early warning to reduce energy waste in buildings. The anomaly/outlier detection in the energy demand will be available in the BEYOND AI Analytics toolkit. The dedicated AI analytic will provide valuable insights the Self-consumption optimization features of the Digital Twin environment for BEPO application (T6.1), and the personal energy analytics PEASH application (T6.3).
20.2 Overview of relevant implementations
The methods for anomaly detection can be classified into distance-based methods, density-based methods, dimensionality reduction-based methods, and deep learning-based methods. Among the distance-based methods, the K-Nearest Neighbor (KNN) algorithm is one of the more popular methods. This algorithm calculates the average distance between each sample point and its nearest K samples in turn and then uses the calculated distance for anomaly detection [1, 2]. The distance-based approach, although effective in some cases, performs better with a priori knowledge of the anomaly duration and the number of anomalies. The density-based approach is to investigate the density of each power consumption pattern and its neighbors. Among them, the local density cluster-based outlier factor (LDCOF) applies the concept of local density in assigning anomaly scores. Refs. [3, 4] used the density-based spatial clustering of applications with noise (DBSCAN) method to detect anomalous power consumption in the wind farm environment. However, the density-based approach cannot take into account time correlation and is therefore not applicable to multivariate time series data. The method based on dimensionality reduction can be used as a classification method that removes irrelevant power patterns and redundancies, possessing a low computational cost [5]. Principal component analysis (PCA) is a multivariate data analysis method that preserves as much as possible the relationships between data extracted from process measurements and reduces the dimensionality of a large number of raw data [6]. However, methods based on dimensionality reduction are only valid for highly correlated data and require that the data follow a multivariate Gaussian distribution [7]. In recent years, deep learning-based methods have been widely used, and work on anomaly detection of time series data has increased significantly. First, convolutional neural networks (CNNs) have proven their effectiveness in different research applications and have superior performance in detecting time series data anomalies compared to artificial neural network (ANN) algorithms. In [8], the authors propose a new anomaly detection technique, FuseAD, which utilizes a statistical ARIMA (Autoregressive Integrated Moving Average model) and convolutional neural network (CNN) based approach to fusing them in a residual manner. The results obtained show that this fusion-based technique can achieve the best of both by combining their strengths and complementing their weaknesses. In addition, deep CNNs can accurately identify the non-periodicity of electricity theft and the periodicity of normal electricity consumption based on two-dimensional (2D) electricity consumption data, solving the problem of low accuracy when detecting electricity theft [9]. In [10], the authors use convolutional neural networks for feature extraction and then use random forest algorithms to detect electricity theft to help utilities solve the problem of inefficient electricity detection and irregular energy consumption. On the other side, Recurrent Neural Networks (RNN) also have excellent performance in time series data prediction, especially LSTM (Long Short-Term Memory) networks. As in [11], the authors use deep learning algorithms to remove seasonality and trends from data for better anomaly detection, helping electric utilities to minimize the impact of uncaptured errors in their daily work. Meanwhile, in [12], the authors propose a power consumption prediction and anomaly detection algorithm based on LSTM neural network, which focuses on seasonal and monthly trends, resulting in a significant improvement in power theft identification. Ref. [13] predicted the system energy consumption using pattern decomposition based on the LSTM algorithm and detected abnormal system energy consumption by Grubbs test using the difference between the predicted and actual values, which effectively reduced the energy waste during the system operation. In [14], the authors combined OC-SVM (one class-support vector machine) and SVDD (support vector data description), based on the generic structure of LSTM, with modified formulas to achieve efficient anomaly detection, especially for time series data, capable of handling variable length data sequences.
20.3 Implementation in BEYOND
For the implementation in BEYOND project we have chosen to use the One Class SVM method.
20.3.1. Data inputs and Analytics Pipeline (incl. assumptions /limitations)
The dataset used for the implementation is an open dataset known as the Building Data Genome 2 dataset, whose detailed documentation is provided in the research paper by Miller et al. [15]. BDG2 is an open data set made up of 3,053 energy meters from 1,636 buildings. The time range of the times-series data is the two full years (2016 and 2017) and the frequency is hourly measurements of electricity, heating and cooling water, steam, and irrigation meters. Of course, this open data set was already cleaned for us, by filling the following data with NaN: • Outliers, detected with Seasonal Hybrid ESD (S-H-ESD). • Zero readings longer than 24 continuous hours: is assumed here that continous zero-readings that long are consequence of a problem in the meter or that the system is down because of the season. • Zero readings in electricity: electricity meter reading shouldn't be absolute zero. So, we have a total of 17544 cleaned data points for our One Class SVM algorithm. The steps of the analytics pipeline for training are explained below:
1. We split the dataset to train/test set based on the timestamp. We use the first 20 months for training and the last 4 months for testing. This accounts for 80% training set and 20% test set.
2. We train the One Class SVM by having feature the energy consumption.
3. After the training is done, then we predict the future values of the test set.
20.3.2. Analytics Libraries Employed
The libraries employed for this algorithm are:
- Pandas
- Numpy
- Sklearn
References
[1] Sial, Ankur, Amarjeet Singh, and Aniket Mahanti. "Detecting anomalous energy consumption using contextual analysis of smart meter data." Wireless Networks 27 (2021): 4275-4292.
[2] Ghanbari, Maryam, Witold Kinsner, and Ken Ferens. "Anomaly detection in a smart grid using wavelet transform, variance fractal dimension and an artificial neural network." 2016 IEEE electrical power and energy conference (EPEC). IEEE, 2016.
[3] Giannoni, Federico, Marco Mancini, and Federico Marinelli. "Anomaly detection models for IoT time series data." arXiv preprint arXiv:1812.00890 (2018).
[4] Zhou, Yifan, et al. "A semi-supervised anomaly detection method for wind farm power data preprocessing." 2017 IEEE Power & Energy Society General Meeting. IEEE, 2017.
[5] Huang, Tingshan, Harish Sethu, and Nagarajan Kandasamy. "A new approach to dimensionality reduction for anomaly detection in data traffic." IEEE Transactions on Network and Service Management 13.3 (2016): 651-665.
[6] Kudo, Takanori, et al. "PCA-based robust anomaly detection using periodic traffic behavior." 2013 IEEE international conference on communications workshops (ICC). IEEE, 2013.
[7] Dai, Xuewu, and Zhiwei Gao. "From model, signal to knowledge: A data-driven perspective of fault detection and diagnosis." IEEE Transactions on Industrial Informatics 9.4 (2013): 2226-2238.
[8] Munir, Mohsin, et al. "FuseAD: unsupervised anomaly detection in streaming sensors data by fusing statistical and deep learning models." Sensors 19.11 (2019): 2451.Munir, Mohsin, et al. "FuseAD: unsupervised anomaly detection in streaming sensors data by fusing statistical and deep learning models." Sensors 19.11 (2019): 2451.
[9] Zheng, Zibin, et al. "Wide and deep convolutional neural networks for electricity-theft detection to secure smart grids." IEEE Transactions on Industrial Informatics 14.4 (2017): 1606-1615.
[10] Li, Shuan, et al. "Electricity theft detection in power grids with deep learning and random forests." Journal of Electrical and Computer Engineering 2019 (2019): 1-12.
[11] Hollingsworth, Keith, et al. "Energy anomaly detection with forecasting and deep learning." 2018 IEEE international conference on big data (Big Data). IEEE, 2018.
[12] Wang, Xiaohui, et al. "Power consumption predicting and anomaly detection based on long short-term memory neural network." 2019 IEEE 4th international conference on cloud computing and big data analysis (ICCCBDA). IEEE, 2019.
[13] Xu, Chengliang, and Huanxin Chen. "Abnormal energy consumption detection for GSHP system based on ensemble deep learning and statistical modeling method." International Journal of Refrigeration 114 (2020): 106-117.
[14] Miller, C., Kathirgamanathan, A., Picchetti, B. et al. The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition. Sci Data 7, 368 (2020). https://doi.org/10.1038/s41597-020-00712-xMiller, C., Kathirgamanathan, A., Picchetti, B. et al. The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition. Sci Data 7, 368 (2020). https://doi.org/10.1038/s41597-020-00712-x
Back to BEYOND_Baseline_Analytics