Meta AI Open-Sources OpenZL: A Format-Aware Compression Framework with a Universal Decoder

How a lot compression ratio and throughput would you recuperate by coaching a format-aware graph compressor and delivery solely a self-describing graph to a common decoder? Meta AI launched OpenZL, an open-source framework that builds specialised, format-aware compressors from high-level knowledge descriptions and emits a self-describing wire format that a common decoder can learn—decoupling compressor evolution from reader rollouts. The strategy is grounded in a graph mannequin of compression that represents pipelines as directed acyclic graphs (DAGs) of modular codecs.

So, What’s new?
OpenZL formalizes compression as a computational graph: nodes are codecs/graphs, edges are typed message streams, and the finalized graph is serialized with the payload. Any body produced by any OpenZL compressor might be decompressed by the common decoder, as a result of the graph specification travels with the info. This design goals to mix the ratio/throughput advantages of domain-specific codecs with the operational simplicity of a single, secure decoder binary.
How does it work?
- Describe knowledge → construct a graph. Developers provide a knowledge description; OpenZL composes parse/group/rework/entropy levels into a DAG tailor-made to that construction. The result’s a self-describing body: compressed bytes plus the graph spec.
- Universal decode path. Decoding procedurally follows the embedded graph, eradicating the necessity to ship new readers when compressors evolve.
Tooling and APIs
- SDDL (Simple Data Description Language): Built-in parts and APIs allow you to decompose inputs into typed streams from a pre-compiled knowledge description; accessible in C and Python surfaces beneath
openzl.ext.graphs.SDDL
. - Language bindings: Core library and bindings are open-sourced; the repo paperwork C/C++ and Python utilization, and the ecosystem is already including group bindings (e.g., Rust
openzl-sys
).
How does it Perform?
The analysis staff experiences that OpenZL achieves superior compression ratios and speeds versus state-of-the-art general-purpose codecs throughout a number of real-world datasets. It additionally notes inner deployments at Meta with constant dimension and/or pace enhancements and shorter compressor growth timelines. The public supplies do not assign a single common numeric issue; outcomes are introduced as Pareto enhancements depending on knowledge and pipeline configuration.
Editorial Comments
OpenZL makes format-aware compression operationally sensible: compressors are expressed as DAGs, embedded as a self-describing graph in every body, and decoded by a common decoder, eliminating reader rollouts. Overall, OpenZL encodes a codec DAG in every body and decodes through a common reader; Meta experiences Pareto positive factors over zstd/xz on actual datasets.
Check out the Paper, GitHub Page and Technical details. Feel free to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Also, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Newsletter.
The submit Meta AI Open-Sources OpenZL: A Format-Aware Compression Framework with a Universal Decoder appeared first on MarkTechPost.