|

Cisco Released Cisco Time Series Model: Their First Open-Weights Foundation Model based on Decoder-only Transformer Architecture

Cisco and Splunk have launched the Cisco Time Series Model, a univariate zero shot time collection basis mannequin designed for observability and safety metrics. It is launched as an open weight checkpoint on Hugging Face below an Apache 2.0 license, and it targets forecasting workloads with out job particular high-quality tuning. The mannequin extends TimesFM 2.0 with an specific multiresolution structure that fuses coarse and high-quality historical past in a single context window.

https://arxiv.org/pdf/2511.19841

Why observability wants multiresolution context?

Production metrics should not easy single scale alerts. Weekly patterns, long run progress and saturation are seen solely at coarse resolutions. Saturation occasions, site visitors spikes and incident dynamics present up at 1 minute or 5 minute decision. The frequent time collection basis fashions work at a single decision with context home windows between 512 and 4096 factors, whereas TimesFM 2.5 extends this to 16384 factors. For 1 minute information this nonetheless covers at most a few weeks and sometimes much less.

This is an issue in observability the place information platforms usually retain solely previous information in aggregated type. Fine grained samples expire and survive solely as 1 hour rollups. Cisco Time Series Model is constructed for this storage sample. It treats coarse historical past as a firstclass enter that improves forecasts on the high-quality decision. The structure operates immediately on a multiresolution context as an alternative of pretending that every one inputs dwell on a single grid.

https://arxiv.org/pdf/2511.19841

Multiresolution enter and forecasting goal

Formally, the mannequin consumes a pair of contexts, (xc, xf). The coarse context (x_c) and the high-quality context (x_f) every have size as much as 512. The spacing of (xc) is fastened at 60 occasions the spacing of (xf). A typical observability setup makes use of 512 hours of 1 hour aggregates and 512 minutes of 1 minute values. Both collection terminate on the identical forecast reduce level. The mannequin predicts a horizon of 128 factors on the high-quality decision, with a imply and a set of quantiles from 0.1 to 0.9.

Architecture, TimesFM core with decision embeddings

Internally, Cisco Time Series Model reuses the TimesFM patch based decoder stack. The inputs are normalized, patched into non overlapping chunks, and handed via a residual embedding block. The transformer core consists of fifty decoder solely layers. A closing residual block maps tokens again to the horizon. The analysis staff take away positional embeddings and as an alternative rely on patch ordering, the multiresolution construction and a brand new decision embedding to encode construction.

Two additions make the structure multiresolution conscious. A particular token, usually known as ST within the report, is inserted between the coarse and high-quality token streams. It lives in sequence area and marks the boundary between resolutions. Resolution embeddings, usually known as RE, are added in mannequin area. One embedding vector is used for all coarse tokens and one other for all high-quality tokens. Ablation research within the paper present that each parts enhance high quality, particularly in lengthy context situations.

The decode process can also be multiresolution. The mannequin outputs imply and quantile forecasts for the high-quality decision horizon. During lengthy horizon decoding, newly predicted high-quality factors are appended to the high-quality context. Aggregates of those predictions replace the coarse context. This creates an autoregressive loop by which each resolutions evolve collectively throughout forecasting.

https://arxiv.org/pdf/2511.19841

Training information and recipe

Cisco Time Series Model is educated by continued pretraining on high of TimesFM weights. The closing mannequin has 500 million parameters. Training makes use of AdamW for biases, norms and embeddings, and Muon for the hidden layers, with cosine studying price schedules. The loss combines imply squared error on the imply forecast with quantile loss over the quantiles from 0.1 to 0.9. The staff trains for 20 epochs and picks one of the best checkpoint by validation loss.

The dataset is giant and skewed towards observability. The Splunk staff stories about 400 million metrics time collection from their very own Splunk Observability Cloud deployments, collected at 1 minute decision over 13 months and partly aggregated to five minute decision. The analysis staff states that the ultimate corpus accommodates greater than 300 billion distinctive information factors, with about 35 p.c 1 minute observability, 16.5 p.c 5 minute observability, 29.5 p.c GIFT Eval pretraining information, 4.5 p.c Chronos datasets and 14.5 p.c artificial KernelSynth collection.

Benchmark outcomes on observability and GIFT Eval

The analysis staff consider the mannequin on two predominant benchmarks. The first is an observability dataset derived from Splunk metrics at 1 minute and 5 minute decision. The second is a filtered model of GIFT Eval, the place datasets that leak TimesFM 2.0 coaching information are eliminated.

On observability information at 1 minute decision with 512 high-quality steps, Cisco Time Series Model utilizing a 512 multiresolution context reduces imply absolute error from 0.6265 for TimesFM 2.5 and 0.6315 for TimesFM 2.0 to 0.4788, with comparable enhancements in imply absolute scaled error and steady ranked chance rating. Similar beneficial properties seem at 5 minute decision. Across each resolutions, the mannequin outperforms Chronos 2, Chronos Bolt, Toto and AutoARIMA baselines below the normalized metrics used within the paper.

On the filtered GIFT Eval benchmark, Cisco Time Series Model matches the bottom TimesFM 2.0 mannequin and performs competitively with TimesFM-2.5, Chronos-2 and Toto. The key declare will not be common dominance however preservation of normal forecasting high quality whereas including a powerful benefit on lengthy context home windows and observability workloads.

https://arxiv.org/pdf/2511.19841

Key Takeaways

  1. Cisco Time Series Model is a univariate zero shot time collection basis mannequin that extends the TimesFM 2.0 decoder solely spine with a multiresolution structure for observability and safety metrics.
  2. The mannequin consumes a multiresolution context, with a rough collection and a high-quality collection, every as much as 512 steps lengthy, the place the coarse decision is 60 occasions the high-quality decision, and it predicts 128 high-quality decision steps with imply and quantile outputs.
  3. Cisco Time Series Model is educated on greater than 300B information factors, with greater than half from observability, mixing Splunk machine information, GIFT Eval, Chronos datasets and artificial KernelSynth collection, and it has about 0.5B parameters.
  4. On observability benchmarks at 1 minute and 5 minute resolutions, the mannequin achieves decrease error than TimesFM 2.0’s, Chronos and different baselines, whereas retaining aggressive efficiency on the overall goal GIFT Eval benchmark.

Check out the Paper, Blog and Model Card on HF. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The publish Cisco Released Cisco Time Series Model: Their First Open-Weights Foundation Model based on Decoder-only Transformer Architecture appeared first on MarkTechPost.

Similar Posts