Release
5 minute read

An AI model with a finger on the time series pulse

IBM’s new foundation model, TSPulse, can go beyond standard forecasting tasks to detect anomalies, fill in missing values, classify data, and search recurring patterns. It’s also tiny enough to run on a laptop.

Foundation models have provided a powerful new way to analyze historical data to make predictions about the future, whether it’s estimating whether a network can withstand peak demand or a company can hit its monthly sales target.

Historical data, however, hold insights that go beyond forecasts. IBM’s new time-series foundation model, TSPulse, was designed to go after them. It can pick out abnormal events in a time series, fill in missing values, classify data into categories, and tease out similar-looking patterns. And it can do all of this with greater accuracy than models 10 to 100 times larger, when measured on leading benchmarks in the field.

At 1 million parameters, TSPulse is the newest member of IBM’s family of lean, high-performing time-series foundation models, complementing IBM’s Tiny Time Mixers forecasting specialists. While other leading time-series models are based on the transformer, TSPulse uses IBM’s earlier TSMixer architecture as its backbone, alternating multi-layer perceptron blocks with selective “gated” attention blocks.

This ultra-efficient hybrid architecture allows TSPulse to be easily tuned and served on devices as small as a laptop, without any specialized hardware. IBM researchers also built in several new features to help the model extract more information from time-series data to improve its versatility.

The innovations include the ability to recognize complex patterns in a time series, allowing TSPulse to pick up on subtle, more sporadic signals, as well as signals that are easiest to spot from either a high-level perspective of the time series or closer-up.

Statistical models have been the dominant way that people mine historical data for insights. But foundation models pre-trained on raw time-series data are quickly closing the gap.

On the leading academic benchmark for anomaly detection, TSB-AD, TSPulse did better than the state-of-the-art in both categories, outperforming powerful statistical models by 24%, and various larger foundation models by at least 33%. (The TSPulse results are reported in a paper now under peer review).

Representing a time series from multiple angles

Masked reconstruction has become the default training regimen for foundation models — whether the raw data comes in words, pixels, or numerical values arranged in time. The model is given a partially blacked out dataset and tasked with filling in the blanks to make it whole. In the process of completing the puzzle, the model learns the underlying structure of the data, improving its ability to generalize later.

IBM researchers used masked reconstruction to teach TSPulse to fill in missing time-step values in their raw training data, as other foundation models are trained to do. But the researchers varied their masking technique to teach the model to interpret the data from different perspectives.

The alternative text
TSPulse can detect a wide range of anomalies in raw time series data by picking out anomalies from several perspectives and calculating a final combined score. The above figure shows TSPulse’s application to the IOPS dataset, an anonymized collection of performance metrics from five internet companies. TSPulse flagged subtle deviations in the KPI (in orange) relative to observed historical patterns (in red) that may indicate emerging performance issues. Credit: IBM/Sumanta Mukherjee.

Using the fast Fourier transform (FFT) algorithm, they converted their time series training data to the frequency domain, and had the model fill in the missing frequency values, too, essentially teaching it to see any given time step from two perspectives: time and frequency. They then projected and fused both views into a numerical representation known as an embedding.

Applied to entire datasets, these hybrid embeddings help different trends pop out. If a time series were a song, time could be thought of as its ‘beat,’ and frequency, its ‘rhythm.’

“Sudden spikes are easier to detect in the time domain, while more subtle, periodic patterns are easier to see in the frequency domain,” said Vijay Ekambaram, an IBM researcher who focuses on AI time series analysis. “Since the model captures and integrates this complementary information, it can learn a better representation.”

The researchers introduced a second twist to the masked reconstruction task. They varied the length of the hidden segments, alternately removing long and short stretches of the time series. This time, their goal was to teach the model to fill in missing values at complementary scales — to pick out patterns from both a bird’s eye and a worm’s eye view.

“Imagine you’re preparing a student for an exam by blanking out both entire words in a paragraph and individual letters,” said Ekambaram. “Each masking style teaches the brain to fill in different kinds of missing information.”

A high-level understanding of the data can be useful when the task calls for screening out noisy data, for things like grouping data into logical categories or filtering the data for similar-looking segments. A granular understanding of the data can be useful when the task involves filling in individual values missing from the series or detecting subtle anomalies.

Global and local views of each time segment, with its embedded frequency information, is then projected into a pair of embeddings: one representing the high-level view, and the other, the detailed view. At inferencing time, the model can summon the most relevant representation for the task at hand.

“The model learns high-level and low-level features together,” said Ekambaram. “Combined with the frequency information that’s already been integrated, this is when the magic happens.”

A general-purpose time-series model

The model’s greatest strength lies in its ability to understand the global meaning of a set of historical events or to key in local signals that may be noteworthy. TSPulse pivots between these competing perspectives based on the task. It picks the summary view for classification tasks or when searching for recurring patterns; It chooses the detailed view for imputing missing values or identifying subtle anomalies.

TSPulse also includes several task-specific features to improve its accuracy at inference time. To detect a diverse range of anomalies, it has three specialist processes (called “heads”) that check the model’s work before outputting an answer.

One head handles time reconstruction; the other, frequency reconstruction; and the third, short-horizon forecasting. At inference time, each head’s original signal is compared to its predicted signal to compute an anomaly score. The model then outputs a combined answer from all three.

“Each head catches different types of issues,” said Ekambaram. “When combined, they cover blind spots and improve detection across a wider range of anomalies.”

TSPulse also features a lightweight module that can be added on during fine-tuning to improve its accuracy on other tasks. This task-specific "lens" essentially magnifies the most relevant data for a given application. “It’s like giving the model a custom pair of glasses for each task so it can see exactly what it needs to,” said Ekambaram. “For reading fine print, you might need a magnifying glass, or binoculars to see a distant object, or tinted glasses to see in different lighting conditions.”

With the help of this task-specific lens, TSPulse outperformed larger state-of-the-art models at classification tasks by 5% to 16%. It did even better at plugging in missing values missing in weather and electricity-use benchmark datasets. TSPulse outperformed the next-best foundation models, as well as statistical models, by more than 50%.

Small models first

Because statistical models are still the preferred way of analyzing historical data, IBM made efficiency its top priority in designing its first time-series foundation models several years ago. Enterprises could run those early forecasting models on a laptop. With TSPulse, they still can — and now they have the tools to solve many more types of problems.

Researchers are now gearing up to integrate TSPulse into IBM's IT and industry automation applications, to help with tasks ranging from physical asset management to IT operations.

“Many enterprises run expensive machines that produce more sensor data than they know what to do with,” said Jayant Kalagnanam, director of AI applications at IBM Research. “TSPulse gives them the ability to search the data and identify patterns they may be seeing currently and to connect those patterns with past failures to avert similar problems.”

Related posts