Time Series Analytics

tsanalytics-panel

TS Analytics is a module of the Waylay Platform that provides advanced capabilities such as time series metric visual exploration, anomaly detection and prediction on the data stored in the Waylay Time Series Database.

Core concepts

The following core concepts are used within TS Analytics:

Time series data

Time series data is a collection of data points that have been stored in the Waylay Time Series Database. A time series can be accessed as a collection of raw data points within a given time range, or any of a number of aggregation operations (e.g. mean, maximum, minimum, sum) can be applied to a time range by grouping multiple points together into time buckets (e.g. grouping data points by hour).

Models

In TS Analytics, the term “model” is used to represent a mathematical model which is used to describe the behavior of a time series. TS Analytics suports a number of different algorithms which can be trained on time series data to model behavior of the series.

Trained models provide the underlying functionality that is needed in order to perform anomaly detection and prediction.

Anomaly detection

Anomaly detection involves detecting behavior in a time series that does not conform to the “normal” behavior for that series. The definition of “normal” in this context is dependent on the model that is being used, and the data it has been trained on.

Anomaly detection is typically used to detect the onset of problems with IOT devices, e.g. detecting that measurements coming from a temperature sensor are not in line with the expected temperatures under normal conditions.

Prediction

Prediction involves predicting future values in a time series based on past values of the series.

There are two main underlying functionalities that can be provided by prediction:

  • predict the value of a time series at a given point in time in the future (e.g. what will the engine temperature be in 2 hours from now)
  • predict the amount of time it will take before a given value is reached (e.g. how long will it take until a battery is down to 10% charge)

Time Series Analytics components

tsanalytics-overview

TS Analytics Designer

The TS Analytics Designer is a graphical web application (part of the Waylay Dashboard) that allows you to view and analyze time series data from the Waylay Time Series Database. The Designer is also used for configuring everything related to Anomaly Detection and Prediction in time series.

TS Analytics Sensors

The TS Analytics sensors are sensor plug-ins for the Waylay Rule Engine, which allow you to make use of Anomaly Detection and Prediction within rules. For example, this allows you to configure the rule engine to perform specific actions as a result of detected anomalies.

TS Analytics REST API

All Time Series Analytics functionality is available via a REST API, so that all configuration, anomaly detection, and prediction functionality can be accessed programmatically.

Time Series Analytics Designer

Overview

TS Analytics Designer

Data visualization and analysis

Visualizationa and basic analysis of time series data is done using the main time series view pane and the overview slider.

When first entering Time Series Analytics by clicking on the Time Series Analytics tile, a resource must be chosen. A resource can be searched for by name, and the list of resources can also be filtered by resource type.

resource selection

Once a resource has been selected, the first numeric time series will be automatically selected and visualized at its natural sampling frequency.

Hovering your mouse cursor over the time series view will show a tooltip that provides the exact values and timestamps of values within the time series.

chart hover

The size of the visible portion of the time series can be adjusted by resizing the overview slider via it slider handles. The range of data visualized can be adjusted by dragging the overview slider by the area between the two slider handles.

Aggregation

series panel

The aggregation operation and grouping granularity are in the time series description panel. A number of different aggregation operations can be selected from the dropdown menu within the time series description panel. The grouping granularity is determined by the zoom level of the overview slider: visualizing a larger range of data will cause the grouping to become coarser-grained (i.e. data will be grouped in larger aggregation buckets), and visualizing a smaller range of data will cause the grouping to become finer-grained (data will be grouped in smaller aggregation buckets).

A maximum of 2000 of data points that can be visualized from a single time series.

Interpolation

Additional settings for the data series can be set by expanding the series panel. The most important additional series settings that can be set is the interpolation setting. Interpolation takes care of filling in gaps of missing values within a data series. As some model algorithms are dependent on there not being any gaps within a time series, interpolation is a functionality that is often needed.

expanded series pane

The most common form of interpolation is linear, which simply fits a straight line to fill in missing data points. There are many other interpolation methods available in the interpolation dropdown. A single constant value can also be set as the interpolation value.

Data stats

The data stats table provides some overview summary statistics for the currently-visible time series.

data stats table

Anomaly detection

Anomaly detection in a time series involves recognizing situations that are out of the ordinary for that time series. Anomaly detection in TS Analytics generally involves configuring and training a model to represent the “normal” behavior for a time series, and then comparing the values that are produced by the model with the observed values from the time series. If the difference between an observed data point and a model’s estimate is above a given threshold, that data point is considered to be anomalous.

There many factors to take into account when configuring anomaly detection, and these mainly fall in the following categories:

  • the choice of underlying model algorithm to use
  • the parameters for the underlying model
  • the way in which a threshold is defined to compare a model’s estimated values with observed time series values

TS Analytics includes auto-configuration functionality to take care of automatically handling the above points using minimal user input. Once auto-configuration has been run, an anomaly configuration can be further tested and validated, tweaked, and then saved and published for use as a service within the Waylay Rule Engine or via the TS Analytics REST API.

The following sections describe the typical workflow for setting up anomaly detection.

Data selection

The first step is to navigate to a resource that contains a time series for which we want to detect anomalies. For the purposes of this example, we will PH measurements from an aquarium. As can be seen in the screenshot below, the PH level in the aquarium has a repeating daily pattern in which it cycles from about 7 to 9. However, at near the end of the time series the daily peak only rises to less than 8, and after that it goes completely flat.

fish tank time series

Auto-configuration

Now that we have a time series selected, we can use auto-configuration to configure anomaly detection. In order to use auto-configuration, we need to specify a range of “normal” data. Note that specifying normal data involves selecting the aggregation level (or zoom level), the aggregation operation, and the start and end points of a time range.

In this case we will leave the aggregation as median and leave the aggregation level at 30 minutes.

We then select a range of normal data from before the time series. This range is selected by clicking on the time series and dragging the mouse over the area to be selected. The selection is then highlighted in grey.

fish tank range selection

At this point we can click on the Anomaly Auto-configure button to trigger anomaly configuration.

anomaly auto-configure button

This brings up a dialog box that allows specifying some information about how the anomaly detection is to be configured. There are two parameters which can be chosen here: seasonality, and use of a “fixed” model.

anomaly auto-configure dialog

Seasonality refers to a seasonal (e.g. daily, hourly) repeating pattern in the data. TS Analytics will attempt to automatically detect the presence of seasonality, and will set the auto-detected seasonality (or lack of seasonality) in the auto-configure dialog. If seasonality has been incorrectly detected (or not detected), it can be manually updated in the dialog.

Fixed model refers to whether or not the model that is configured should remain fixed to the general patterns that are present in the training data, or automatically adjust as the time series data changes. A fixed model should be chosen if you are certain that the general behavior of a time series won’t (or shouldn’t) change over time – for example, if the time series represents a controlled environment. Use of a fixed model should be disabled if we know that the behavior of a time series will change over time, and we want the model to automatically adapt to these changes.

We know that the PH level in our aquarium is expected to remain within a strict range and that this range should never change, so we choose a fixed model in this case.

We then click on the Start button to trigger the configuration of anomaly detection. After a number of seconds, the time series chart will have a blue line of model observations added to it. These are the estimates that the model has made based on the selected range of data that was used for auto-configuration.

Note that the aggregation operation and aggregation period are no longer editable, and that a specific model (SeasonalAveraging) has been configured.

locked series panel

Validation

We can now validate the automatically configured settings by selecting an alternate range of data and then clicking the Anomaly Calculate button. As can be seen in the screenshot below, anomalies (signified by red dots) are now detected where the observed values of the time series are not in line with the model estimates. The model estimates are also displayed for the selected range of data.

anomalies detected

Anomaly score visualisation

When running anomaly detection, the Residuals and Anomaly Scores chart is displayed with additional information about the classification of data points as anomalous or non-anomalous.

Internally, each data point receives an “anomaly score”. The specifics of how this score is calculated is dependent on the chosen anomaly method, but in all cases a data point that has an anomaly score of 1 or greater is classified as an anomaly, and a data point that has an anomaly score of less than 1 is classified as non-anomalous.

The anomaly score chart (shown below) displays the anomaly score for each data point. As can be seen, anomalous data points are above the theshold of 1, while non-anomalous data points are below the threshold. This visualisation can assist in understanding why a given data point was (or wasn’t) considered anomalous. It can also be helpful in fine-tuning anomaly thresholds.

anomaly score chart

The residual chart (shown below) also provides a view that shows each data point as it falls within our outside of the range of acceptable values based on its residual (i.e. the difference between the observed data point and the model estimate). The threshold lines in this chart are based on the configured residual thresholds in the anomaly detection. As with the anomaly score chart, the any data points that are outside of the threshold are classified as anomalous.

residual chart

You can switch between the anomaly score chart and residual chart using the toggle button at the top of the chart.

residual/anomaly score toggle

Visualisation of anomaly estimate selection for residual calculation

As explained in the smooth window configuration reference, calculation of residuals and anomaly scores can be based on selecting the closest model estimate within a given window (instead of only selecting the model estimate at the same timestamp). If a smoot window is configured for anomaly detection, then an anomaly estimate visualisation is also made available in the main chart view when performing anomaly detection. This visualisation is hidden by default, but can be made visible by clicking on Anomaly Estimate in the chart legend.

The anomaly estimate visualisation shows a fine red line showing which model estimate has been selected for each corresponding observation. This visualisation can often assist in understanding why a given point has (or hasn’t) been flagged as an anomaly, by seeing which model estimate it was compared to.

anomaly estimate visualisation

Further anomaly configuration

We can now further tweak our anomaly configuration if it doesn’t yet perform exactly as we want.

All anomaly settings can be accessed by expanding the Anomaly panel.

expanded anomaly panel

All configured settings for how the difference between the model estimates and observed values should be interpreted as anomalies can be configured here. The specific meaning of each of these configuration settings is explained in the anomaly configuration reference section below.

Information on all configuration settings is available in the anomaly configuration reference section

For the sake of example, we can now change the Consecutive setting to 3, and then re-run anomaly detection on the same time range. This configuration change now means that an anomaly will only be flagged if there are at least 3 consecutive “anomalous” data points. This can now be seen in that a number of points that were previously highlighted in red are now shown in orange, and are classified as “Ignored anomalies” .

ignored anomalies

Selecting a range of “normal” data and then clicking the Calculate button results in no anomalies being detected.

Further anomaly details

More specific information on anomaly detection can be seen in the Anomalies table at the bottom of the screen. This table shows various filtered listings of observations, model estimates, and their potential anomalies.

The dropdown menu above the table can be used to select various filtered views of the table. Selecting All provides an unfiltered view of the table.

The columns of the anomaly table are as follows:

  • Timestamp - timestamp of the observation and related model estimate. This timestamp is shown in the local time zone of your web browser.
  • Observed - the actual observed value from the time series at the timestamp
  • Estimate - the estimate provided by the model at the timestamp
  • Residual - the difference between the observed value and estimated value
  • Score - a numerical score given to the specific point to determine if it is considered anomalous. Any data points with an anomaly score of 1 or more are considered anomalous, and any points with an anomaly score below 1 are not considered anomalous. The calculation of the anomaly score is typically done based on the residual value, and the method used to calculate it depends on the configured anomaly settings.
  • Grouping identifier - this column will only be present if the consecutive setting for anomaly detection is set. The values in this column are period-separated numerical values, with the first value being the anomaly group identifier, and the second value being the index of that observation within its group.

anomaly table

Saving and deploying the configuration

Now that we have a working configuration for anomaly detection, we can save it and deploy it to make use of it within the Waylay Rules Engine.

We first save our current anomaly configuration by clicking on the Save icon, and give the configuration a descriptive name.

save configuration button

save configuration dialog

With our configuration saved, we can now jump directly into the Waylay Rule Designer by clicking on the Designer icon in the Anomaly pane. This brings us into the rule design environment with an anomaly sensor configured to make use of our saved configuration. We can now build a rule to automatically react when an anomaly is detected.

open in designer button

Prediction

Prediction in a time series involves predicting future unknown values of a time series based on past behaviour within the time series.

Like with anomaly detection, prediction involves training a model on past data from a time series, and then using that model to calculate estimates or predictions of future values. Again, like in anomaly detection there are many configuration options that can be set for prediction, so TS Analytics provides auto-configuration functionality to set things up initially.

Auto-configuration

For this example, we will use a battery charge value as a time series. As can be seen below, the values in this time series are gradually decreasing over time.

prediction data selection

We start by selecting a range of training data, and then click on the Auto-configure button to configure the prediction model based on the training data. This brings up the auto-configure dialog, where the (auto-detected) seasonality and trend can be reviewed and manually updated. Seasonality in this context is the same as it is for anomaly auto-configuration. Trend represents the general slope of time series, after the seasonal effect has been removed.

battery auto-configure dialog

Auto-configuration for prediction can take up to several minutes, as it involves trying multiple differently-configured models to find the one that provides the best predictions.

Once a prediction model has been chosen, the initial predicted values are displayed in purple. Depending on the algorithm that is selected via configuration, there may also be an upper and lower bound of predictions displayed. In this example case, we can see that the Arima model has been selected.

battery predictions

Validation and testing

The prediction ability of the auto-configured model can now be verified by comparing its predictions against known values of the time series. This is done by selecting a range of data, and then clicking the Calculate button in the Prediction pane. Predictions will then be calculated for the range of data following the selected range.

battery predictions validation

The length of the predicted range (along with a number of other settings) can be configured by expanding the Prediction pane. For example, we can use the window setting to get 2 hours of predictions.

prediction settings

We can now also predict future values for which there is no data by selecting a range that ends at the most recent value in the time series.

Saving and deploying the configuration

As with anomaly detection, a prediction configuration can be saved, and then used in the Waylay Rules Engine. After saving the configuration, clicking on the Designer button in the Prediction pane will bring you to the Rules Engine Designer with an initial rule that contains a prediction sensor.

Working with saved configurations

When the TS Analytics designer is first opened, it is always initialized with a blank configuration for the first numeric time series of the selected resource.

Alternate time series metrics can be selected from the dropdown within the breadcrumb.

Previously saved configurations can be selected from the dropdown in the configuration save panel.

A configuration can be saved or re-saved at any time by clicking on the save button. It’s also possible to revert the current configuration to the last-saved state by clicking on the Revert button, or clearing the current configuration by clicking on the Clear button.

save configuration panel

Advanced topics

Model stats

The Model Stats section gives back model fitting statistics. Depending on the algorithm, these are:

  • aic or Akaike information criterion is a estimator of the quality of the model for the given data set. Lower values are better, but only aic values on the same dataset can be compared.
  • bic or Bayesian information criterion is related to aic, but uses a bigger penalty for having many parameters in the model.
  • df_model reports the degrees of freedom in the model. When this is high, a model tends to over-fit: it can make prediction with smaller error terms on the training data set, but might fail if used on another (later) sample of the data is used.
  • sigma_2 is variance of the residuals, (i.e. the difference between predicted and observed values)
  • llf is the value of the log-likelihood function at the fitted parameters. It measures how likely these parameters are given the observed data.
  • nobs gives the number of observations
  • sse or sum of squared errors

Manual model configuration

The model to use for estimations, and all related settings for models can be configured manually. Configuration can be started by expanding the Model pane, which exposes a dropdown menu of all available model algorithms.

Model algorithms typically contain two (optional) types of parameters:

  • model parameters these define how the model algorithm behaves when being trained. These are also typically known as hyperparameters
  • trained parameters these contain the information that is built up for the model during training based on time series data, and used internally by the model algorithm to calculate estimates

Not all models include model parameters and/or fitted parameters.

expanded model pane

Selecting a model algorithm from the dropdown exposes all editable model parameters for that model algorithm. These parameters can then be manually filled in, and then estimates over a selected range of data can be calculated by clicking on the estimate button. If the selected model makes use of trained parameters, then the train button will need to be clicked first (with a range of training data selected) before the estimate button can be used.

model pane buttons

Configuration reference

Models

The following section describes each of the currently available model algorithms in TS Analytics.

More detailed descriptions can be found in the api documentation.

This release only supports univariate models, where we model a single metric of a resource, given its previous values. Upcoming releases will expand to multivariate models, where we integrate data from multiple resources and metrics.

Simple models

Linear interpolation models

Other models

Fixed

Fixed is the most simple algorithm. It lets you select a single expected value, the value model parameter. While it seems very naive, you can use and compare it as a robust alternative to the more elaborate algorithms. The example below shows how we use algorithm.model_params.value=25 as a baseline to compare incoming temperature measurements. In this case we chose to signal an anomaly when the measurement deviates more than 1 degree 4 times in a row.

MeanStd

MeanStd works as the Fixed algorithm, but computes the expected value from the mean of all values in data window. Predictions in the future are hence fixed to this mean. The anomaly detection uses the distribution around this mean value to detect outliers.

RollingMeanStd

RollingMeanStd is a variant of MeanStd that computes the expected value as the mean of all values in a rolling window of fixed size. This window size is a model parameter algorithm.model_params.window. In the example below we used a 3 hour (PT3H) window size. Predictions are the same as when using MeanStd with a data window of 3 hours.

But anomaly detection looks at the residuals (the difference between observed values and the now more variable predictions), which are distributed more narrowly than in the MeanStd model. So with the same (low) anomaly percentile threshold of 5%, we get more anomalies.

OLS

The OLS (Ordinary Least Squares) linear algorithm fits the data linearly with time. It is suited to detect a linear trend, and report anomalies when values start to deviate from that trend.

Arima

The Arima, or auto-regressive integrated moving average algorithm models a time series from a linear combination of the previous (lagged) values and estimation errors. To remove trends in the data, it uses differencing, i.e. it replaces the time series with the difference between consecutive measurements.

The model parameters for an ARIMA(p,d,q) model, to be given by the user, are:

  • p: the order of the auto-regressive (AR) part: how many previous values are used in the linear combination that predicts the next value
  • d: the order of differencing (I): how many times differences are taken to eliminate non-stationarity. In the created model, these differences must be integrated again (hence I)
  • q: the order of the moving average (MA) part: how many previous error terms are used in the linear combination that predicts the current error term

The model training call will then search for fitted parameters, i.e. the p+q+1 coëfficients of the linear combinations that best fit the data in a given data window. These are the 𝜑ᵢ, 𝜃ᵢ and 𝛿 in the next formula, where 𝐿 is the lag operator, and 𝑋ᵢ the time series values).

arima model formula

When any of the parameters are zero you get a simplified model:

  • ARIMA(0,0,0) is a white noise model, like MeanStd: i.e. a fixed value with an error term. arima model formula
  • ARIMA(p,0,0) is an auto-regressive model, where the output depends linearly on the p previous values and an error term: arima model formula
  • ARIMA(0,1,0) is a random walk model, where each value equals the previous value with an error and constant drift:
  • ARIMA(0,1,1) is same as basic exponential smoothing

Sarima

The Sarima or seasonal auto-regressive moving average algorithm expands the Arima model with seasonal components. The model parameters of a Sarima(p,d,q,P,D,Q,s) have the following extra elements:

  • seasonal_window (or seasonal_periods) models the period (e.g. daily or weekly) for recurring patterns
  • the seasonal_order parameter P, D, Q prescribe how to handle the influence of observations with a lag of one or more seasons.

AR (p/P) and MA (q/Q) orders can be specified as

  • an integer, indicating the order of the lag polynomial: if 3, then influence of the three previous data points (in steps or seasons) is modeled
  • an array of 0, 1 indicators specifying which terms of the lag polynomial must be fitted: if e.g. [1,0,0,0,0,1] only influence from the previous and the six-steps-removed datapoints are fitted. This format allows modelling seasonality over multiple windows, or reduce the order (complexity, hence fit time) of the model, while still modeling influences that are further in the past.

Exponential Smoothing

Exponential smoothing uses an average of all previous observations with exponentially decreasing weights. In its simplest form, it uses a fitted parameter smoothing_level 𝛼 so that the prediction for the value at time t+1 is . Exponential smoothing can be applied recursively, to handle trends and /or seasonality in the data.

The model parameters that a user can set are:

  • trend, which be set to:
    • none no trend, so we assume the time series is distributed around some fixed value on the long term
    • additive e.g. a value increases with 0.59085 each day
    • multiplicative e.g. a value decreases with 1.034% at each observation
  • seasonal, which can be set to:
    • none no seasonality
    • additive e.g. each day, the value at 10:00 is about 5 higher than at 09:00
    • multiplicative e.g. each day, the value at 10:00 is about 3% lower than that at 9:00
  • seasonal_window specifies the seasonality of the model, by default it is one day P1D (alternatively one can use or seasonal_periods )
  • damped, which -if true-, will model the trend component with an additional damping factor

The fitted parameters that are computed when training the model are:

  • smoothing_level or 𝛼, the factor with which the weights of previous value decrease.
  • smoothing_slope or 𝛽, the factor with which the weights of the differences decrease to estimate the next difference
  • smoothing_seasonal or 𝛾, the factor that smooths the differences of a value with its seasonal average
  • damping_slope or 𝜑, the factor that dampens the trend component

The following example uses a daily seasonality to predict hourly averages of the occupancy of traffic at a given point of the Ghent R40 road:

es model demo

The same data set, now averaged 3-hourly and analysed with a weekly seasonality

es model demo

BitMapDetector

The BitMap Detector algorithm specializes in anomaly detection, it does not support the predict use case. It uses Linkedin’s luminol library, implementing an anomaly detection method described in the Assumption-Free Anomaly Detection paper. This algorithms construct bitmap ‘fingerprints’ of data changes on a small scale, and decides which of the fingerprints stand out. The fit use case computes anomaly scores rather than predictions and residuals. The optional model parameters are:

  • precision which controls how many value buckets of width (max-min)/precision to categorize values in
  • lag_window_size: number of events in the lagging window (default 2% of the series length)
  • future_window_size: number of events in the future window (default 2% of the series length). Note that the total window size must at least be 50 data points.
  • chunksize

SeasonalAveraging

SeasonalAveraging is a very simple yet very powerful model for data that exhibits a clear seasonal pattern; a typical example is something like traffic congestion for a given section or road. The SeasonalAveraging algorithm is based on taking the mean value in repeating time buckets over multiple seasonal buckets.

In terms of modeling something like traffic patterns, this basically equates to calculating the mean rate of traffic at 9:00 am on Monday mornings over the past N weeks, and then using the mean of these values to provide the estimate for 9:00 am on a Monday morning in the future.

The model parameters of SeasonalAveraging are:

  • fixed: if true, the algorithm uses aggregated seasonal data fixed at fitting time to estimate all predictions; otherwise, seasonal estimates are derived from the data window just before the estimation timestamp
  • seasonal_window: positive ISO-8601 period to express the granularity with which data is sampled
  • sliding: If true, each seasonal bucket is a sliding window around the data-point that is a whole number of seasons away from the estimation point (e.g. 1 hour before and after 8u23 if 8u23 is the estimation timestamp). If false, seasonal buckets are anchored to fixed granularity (e.g. from 8u00 to 9u00 in the example above).

Anomaly configuration reference

anomaly methods

Currently, ts-analytics always compares the anomaly score of an observation (in most cases the residual with respect to a prediction) with an upper or lower bound. These are the methods to establish these bounds, each having parameters that influence the sensitivity of the anomaly detection:

  • in the absolute method the user specifies a maximum (absolute) size of the residual: i.e. when setting to 3.2, each observation that has a value that deviates more than 3.2 from the fitted value will be considered an anomaly.
  • the std method uses the standard deviation of all the residuals in the data window as a basis. The user specifies a level. E.g. if the level is set to 1.98, an observation is anomalous if it deviates more than 1.98 standard deviations form the predicted value.
  • the percentile method uses quantile bounds derived from all residuals in the data window. The user specifies an alpha (𝛼) value between 0 and 1. This indicates the proportion of the residuals in the full data window that can be marked anomalous. This test can be single sided or both sided (depending on the tail setting). When tail=both (default) the quantile interval [𝛼/2,1-𝛼/2] is tested. For tail=low this is [𝛼,1] and for tail=high [0,𝛼]. E.g. when 𝛼=0.01, on the long term one in hundred observations should be reported anomalous. The algorithm might find [-0.23,0.5] as boundaries for anomalous residuals, when only 1 in 200 where not bigger then -0.23 and only 1 in 200 were bigger than 0.5.
  • the threshold method specifies directly the lower and/or higher bound of the score/residuals. Either as a value (low, high), or as a proportion of the range of values that the score takes (alpha). This alpha works as in the percentile method, but with the assumption of a uniform distribution of the scores between their min and max value.

anomaly tail option

The tail option configures what tail(s) of the residual distribution must be checked for anomalies. When tail=both (default), observations at both sides of the predictions are tested for anomalies. If high, only observation that are (much) larger than the prediction are considered, when low only observations (much) smaller than the prediction are.

anomaly window

Only observations in the anomaly window are checked for anomaly. Normally this is a small window up till present, (e.g. anomalies in the last 5 minutes)

consecutive option

The consecutive option is a threshold for the number of consecutive anomalies that need to be present before indicating the result to be anomalous. If e.g. 3, there need to be 3 consecutive anomalies to declare the result as anomaly=true. Each of the anomalies get a consecutive_anomaly_index that indicates their position in a group of consecutive anomalies.

smooth window

The smooth window option allows selecting the nearest of of any estimate data point within a window when calculating a residual value. By default, the smooth window is 0, meaning that the model estimate at the same timestamp as an observation will always be used for calculating the residual value (and therefore anomaly score). However, if for example the smooth window is set to 2, then the two model estimates before an observation’s timestamp and the two model estimates after an observation’s timestamp (in addition to the model estimate at the observation’s timestamp) will be used to calculate the residual. The minimum residual of all points within the smoothing window will be selected as the residual for a given observation data point. This setting is makes anomaly detection much more robust against small time shifts, particularly in seasonal data.

Prediction configuration reference

Prediction window

The window over which predictions are to be calculated. This window is expressed as an ISO-8601 duration.

This setting is mutually-exclusive with the prediction periods setting.

Prediction periods

The number of sampling periods over which predictions are to be calculated. The length of a sampling periods is defined by the sampling rate of the underlying time series; for example, if the underlying time series is sampled once every minute, then specifying a periods setting of 10 will respresent in 10 prediction samples being calculatd, spaced one minute apart.

This setting is mutually-exclusive with the prediction window setting.

Alpha

The alpha setting is the alpha level used for reporting the confidence intervals of predictions. Note that this is only used if the underlying prediction model algorithm supports reporting of confidence intervals (currently only the Arima model supports this).

Accumulate

If specified, the the accumulate option will add accumulated values for the predictions with the given accumulation/aggregation method, instead of reporting the underlying raw prediction values from the model.

TS Analytics sensors

The following sensors use the tsanalytics client in the waylay sdk to retrieve anomalies and predictions, and map it to states that the Waylay rule engine can reason about:

Analytics Option Configuration

TODO REview all terminology in this section

The Analytics Option Configuration is the json object that gets stored when you click the Save button in the Analytics Designer. When saving under a metric config name my_metric_config, it is stored in the resource metadata under /tsa/configs[name='my_metric_config']/options It is then available in the sensors (fill out the my_metric_config in the metric field).

This correspond to REST calls on the TS Analytics Server :

  • GET /anomaly/<my_resource>/<my_metric_config> for anomaly detection
  • GET /predict/<my_resource>/<my_metric_config> for forecast/prediction
  • GET /config/<my_resource>/<my_metric_config> to retrieve the option configuration

See TS Analytics REST API for a detailed description of the content of these option configurations, and how they are used in the API. An example:

{  
    "metric":"lightAmbi",
    "window":"P21D",
    "algorithm":{  
        "name":"ExponentialSmoothing",
        "model_params":{  
            "seasonal":"additive",
            "seasonal_window":"P7D",
            "trend":"none"
        },
        "fitted_params":{  
            "smoothing_slope":0,
            "damping_slope":null,
            "smoothing_level":0.6315789449456448,
            "smoothing_seasonal":0.3684210502088028
        }
    },
    "anomaly":{  
        "method":"std",
        "level":3,
        "tail":"both",
        "consecutive":1
    },
    "predict":{  
        "alpha":0.05
    }
}

Limitations

Currently, all requests are handled synchronous, and some limitations are imposed on the amount of data and processing time that can be used. As of writing, these limitations are:

  • No more than 2000 data points (after aggregation) can be used as input observations
  • A timeout of 120 seconds is enforced (these will result in 504: Gateway Timeout responses)

Note that model training calls can lead to long processing times, and are only meant to be used while setting up a analytics workflow. The predict, anomaly use cases are normally more efficient, but must be tested for performance, especially when using larger data windows.

When having problems with response times, consider the following adaptations:

  • Check whether the request is computationally heavy on its own (e.g. fit calls), or whether it was queued up because of high load. In the last case, load reduction or allocation of more computing resources to TS Analytics should resolve the issue.
  • Reduce the size of the data window
  • Increase the aggregation frequency
  • Split up the anomaly or forecasting in separate time horizons : one for the short term (e.g anomaly detection based on last hour window), and one for the mid- or long term (e.g. anomaly detection of daily means)
  • Reduce the complexity of the model: e.g. in Arima, you should use the lowest p,d,q values that minimize the aic criterion.
  • Check the processing time using the Grafana statics