Training Set
A training set can be created directly from the time series. A certain number of measured values is used as inputs and the value to be predicted (i.e., the value in the future, in some chosen distance after these input measured values) is used as required output. Input part of the time series is called the window, the output part is the predicted value. By shifting the window over the time series, the items of the training set are created (see figure 4). It is advisable to leave part of the time series for testing, that is, not to use this part during learning, but to use it to test how successfully the network learned to predict our data.
The training set obtained in this way can then be adjusted for the needs of a particular neural network. For example, it may be necessary to adjust the values to a certain interval, such as (0,1).
Figure 4 - Creating training set
Available data are often divided into three sets: the learning set, the validation set, and the testing set. These sets can overlap (see figure 5) and do not have to be continuous. The learning set is the sequence that is shown to the neural network during the training phase. The network is adapted to it to achieve the required outputs (in other words, the network weights are changed based on this set). The difference from the required output is measured using the validation set, and this difference is used to determine whether the training of the network can be stopped. The last set, the testing set, is then used to test whether the network is able to work also on data that were not used in the previous process.
To summarize, the learning set is used to create a model, the validation set is used to verify the model, and the testing set is used to test the usability of the model.
Figure 5 - Validation, learning and testing data sets
Data preprocessing is important as well. For example, it can be useful to remove trend and other components (such as seasonal trends) - of course only if we are able to detect such components. An overview of time-series decomposition can be found in the references.
Especially for neural networks that can have outputs only in a certain interval it is important to realize that it is not possible to predict values outside of this interval. Data normalization is then required for the network to be able to get meaningful outputs.
ยท