# Mid-to-Long-term demand forecasting (building/ portfolio) / electricity and heating

**14. Load forecasting in medium to long (7-30 days) and long term (seasonal/year) term (V2)**

**14.1 Rationale & Link to BEYOND Apps**

Medium to long-term and long-term load forecasting serve as critical components in the strategic management and optimisation of electrical infrastructure. These forecasts inform decisions regarding maintenance, capacity resource allocation and capacity planning, helping to the efficient operation of power systems. Traditional long-term load forecasting methods tend to focus on daily aggregate energy consumption, providing a coarse-grained perspective on the power demand.

Even so, there has been a surge of interest in developing long-term load forecasting models that offer hourly granularity (Basaran Filik, 2009; Agrawal, 2018). This increased resolution facilitates a more nuanced understanding of the temporal dynamics of energy consumption, which is essential for effective network asset planning and sizing. In the context of the BEYOND project and specifically task T5.2, the goal is to minimise potential network congestion and optimise resource distribution. Long-term hourly predictions allow for a more accurate assessment of potential congestion events and help identify critical time periods that may require additional attention.

Moreover, the high-resolution forecasts generated by these models can be utilised across other tasks within the WP5 framework, such as T5.1, enhancing the overall efficiency and effectiveness of the project. The versatility of these predictions also allows for their integration into various other applications that might benefit from enriched feature sets in the future. Ultimately, the development and implementation of long-term load forecasting models with hourly granularity can provide valuable insights and contribute to the optimisation of power system management and operation.

**14.2 Overview of relevant implementations**

In the power industry typically involve a combination of historical data analysis, weather data and various forecasting models, they often consider factors such as economic growth, energy policies and technological advancements. Some examples are the following:

• Independent System Operators (ISOs) and Regional Transmission Organizations (RTOs): they are responsible for managing the electricity grid and ensuring reliable supply. They use load forecasting to guide capacity planning, transmission expansion and resource procurement. For example, California ISO (CAISO) publishes some information about this. [1]

• Energy Regulatory Agencies: it is used by them to assess the adequacy of the electricity supply and to inform energy policies. For example, the U.S. Energy Information Administration (EIA) publishes an annual energy outlook, which includes medium and long-term forecasts for the United States. [2]

Forecasting in general is usually seen as a timeseries problem and so is in the case of load forecasting. Forecasting is usually done by using either a statistical or machine learning approach. Statistical approaches use techniques like ARIMA or exponential smoothing [3] while ML techniques use various flavors of neural networks such as Long-short term memory recurrent neural networks (LSTM-RNN) [4] or ensemble methods like swarm particle optimization [5] among others.

**14.3 Implementation in BEYOND**

For BEYOND we intend to explore both statistical and ML methods in order to find the one that best suits our needs attending not only accuracy measurements like the normalised RMSE (nRMSE), the NMAE and MAPE metrics but also the ability to explain the obtained results. The goal is to be capable of distinguish trends and seasonal components on the data which can be done by applying some statistical operations on the distribution and extracting them. That will help us see other relevant parameters that can be considered for the training phase. The model will be based on data coming from pilots, but we also intend to experiment with introducing other external variables such as the plans for urban development in the area which might deeply affect long-term increase of power demand. We also leverage public sources like e-sios [11], which allows us to obtain a wide variety of data such as several live generations (including hydraulic, solar and wind power among others), live/expected/programmed electric demand…

We think that the best way to generalise all data is to use time series and normalised data, because the behaviour should be similar in different countries but weather it’s a delicate thing that can be wrong at long-term. Another reason is that since we do not have the data from our provider, we must be able to generalise as accurately as possible any casuistry. So, we are using datetime information such as day, month, week, day of year, if it is weekend, hour and normalising with a Min-Max scaler the target feature. We used plotly [10] to build graphics like this one:

It allows us to input 4 parameters in a single graphic: hour, month, weekday (these three on the 2 axis) and electric demand (orange colour palette). This brings us rich information about this data behaviour and relevant variables that can be used on training phase.
We decided to use KNN algorithm for its advantages and easy adaptation to different number and types of variables. We also decided to test all the permutations of 3 or more variables from the ones we have on first place, then use a grid search for parameter tuning and iteratively check for the metrics (MSE, R2 and accuracy) to always get the best model with its set of parameters and its best set of variables.

So, when using our models on BEYOND platform the user must enter a date range of datetimes and the peak power (installed) they have. In a table format the peak power repeated as many times as rows of datetimes there are. This way, inside BEYOND pipeline all datetime features used in training will be extracted and directly used in forecasting. We also used electricity demand data without industry to try to categorise some behaviours about consumption:

• A: intensive and constant use, higher load in winter.

• B: intensive use.

• C: constant use.

• G: standard use (utilised in the prediction model)

• L: low use during the day.

This would have helped users to know the category where they are at consumption level, but it did not success, low rate of accuracy (0.22) so we discarded the classifier idea. Some plots let us know how different categories act over time, simple distribution plots, pie charts… Below we can see an image with these graphs: one with all behaviours, a pie chart that represents each category demand percentage out of total and one distribution chart per class. Time series are limited to exactly 1 month, for visual reasons (January 2016).

**Results & Discussion**

In this project, we have trained two distinct prediction models: actual electricity demand with and without industry data. Both were trained with a KNN Regressor algorithm which gave best results and using a permutation loop for trying different subsets of features. This method allows us to see what characteristics are more relevant for its patterns and seasonalities, which we can see on next sections

Electricity actual demand (with industry)
The optimal KNN parameter tuning for this dataset is **{'algorithm': 'brute', 'n_neighbors': 10, 'p': 1, 'weights': 'uniform}**, we just change the number of neighbours counted for the prediction to 10.

With respect to the subset of features, it is **['Mes', 'Semana', 'Hora','Dia_Semana']**, where we have two new characteristics called “Semana” which is the week number and “Dia_Semana” which is the weekday (1-7).

After training phase, the metrics results on test set returned these values:

• **R2**: 0.8121035159900712

• **MSE**: 0.004449415920170373

This means that the model can explain 81.21% of the target data variation with the independent variables (features).

Electric actual demand (excluding industrial buildings)

The optimal KNN parameter tuning for this dataset is {'algorithm': 'brute', 'n_neighbors': 4, 'p': 1, 'weights': 'distance'}. With regards to the optimal features subset, it is ['Mes', 'Dia', 'Hora'], that changes a little from last one, but we can clearly see that it takes into account a seasonality, because it uses month, day and the hour. After training phase, the metrics results on test set returned these values:

• **R2**: 0.9806703949173576

• **MSE**: 9.04039574935867e-05

This means that the model can explain 98.06% of the target data variation with the independent variables (features).

**Discussion**

Each of these two models has its correspondent pipeline where it is used, because each one needs different features as input, so the data will be processed differently. Below we can see part of a pipeline example, more precisely the one for electric demand (industry included).

**14.3.1. Analytics Libraries Employed**

We intend to research these python libraries pending to close all pipelines and determine any more methods to be used:

• Pandas [6]: a manipulation and data analysis tool. Mainly used for its structures: DataFrame and Series which allow us to shape our data the way we want and access it easily. Helpful in data processing tasks and creation of test files used on BEYOND’s platform.

• Numpy [7]: a library that take advantage of the power such as C in various mathematical operations on arrays, improving efficiency. It is used on Pandas’ backend.

• Scikit-learn [8]: machine learning library supporting supervised and unsupervised learning. Also used for data processing, model selection/evaluation, normalization…

• Pickle [9]: library for serialize and de-serialize Python objects. Used for saving ML models. Indicated on BEYOND’s platform that trained models should be uploaded as *.pkl.

• Plotly [10]: library used for making interactive graphs.

**References**

1. California ISO (CAISO), "Today's outlook.," [Online]. Available: https://www.caiso.com/TodaysOutlook/Pages/default.aspx. [Accessed 4 May 2023].

2. U.S. Energy Information Administration (EIA), "Annual Energy Outlook," [Online]. Available: https://www.eia.gov/outlooks/aeo/data/browser/. [Accessed 4 May 2023].

3. R. K. a. M. F. a. T. M. M. Agrawal, «Long term load forecasting with hourly predictions based on long-short-term-memory networks,» 2018 IEEE Texas Power and Energy Conference (TPEC), pp. 1-6, 2018.

4. U. a. G. O. N. a. K. M. Basaran Filik, «ourly Forecasting of Long Term Electric Energy Demand Using a Novel Modeling Approach,» 2009 Fourth International Conference on Innovative Computing, Information and Control , pp. 115-118, 2009

5. A. A. A. a. E. M. K. Hafez, «Particle swarm optimization for long-term Demand Forecasting,» 2016 Eighteenth International Middle East Power Systems Conference (MEPCON), pp. 179-183, 2016.

6. Pandas, "Data structures for statistical computing in python.," [Online]. Available: https://pandas.pydata.org/docs/user_guide/index.html#user-guide

7. NumPy, "Array programming with Numpy.," [Online]. Available: https://numpy.org/doc/stable/user/index.html#user.

8. Scikit-learn, "Machine Learning in Python.," [Online]. Available: https://scikit-learn.org/stable/user_guide.html.

9. Pickle, "The Python Library Reference.," [Online]. Available: https://docs.python.org/3/library/pickle.html.

10. Plotly, "Collaborative data science.," [Online]. Available: https://plotly.com/python/.

11. Sistema de Información del Operador del Sistema (e-sios), "Red Eléctrica de España (REE)," [Online]. Available: https://www.esios.ree.es/es.

12. M. Z. a. S. M. N. a. S. H. R. M. a. H.-E.-H. M. a. K. Rahman, «Forecasting the long term energy demand of Bangladesh using SPSS from 2011–2040,» 016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), pp. 1-5, 2016.

Back to BEYOND_Baseline_Analytics