v2.7.0

KodiaqQ released this 16 Nov 14:59

· 2168 commits to develop since this release

Post-training Quantization:

Features:

(OpenVINO) Added support for data-free 4-bit weights compression through NF4 and INT4 data types (compress_weights(…) pipeline).
(OpenVINO) Added support for IF operation quantization.
(OpenVINO) Added dump_intermediate_model parameter support for AccuracyAwareAlgorithm (quantize_with_accuracy_control(…) pipeline).
(OpenVINO) Added support for SmoothQuant and ChannelAlignment algorithms for HyperparameterTuner algorithm (quantize_with_tune_hyperparams(…) pipeline).
(PyTorch) Post-training Quantization is now supported with quantize(…) pipeline and the common implementation of quantization algorithms. Deprecated create_compressed_model() method for Post-training Quantization.
Added new types (AvgPool, GroupNorm, LayerNorm) to the ignored scope for ModelType.Transformer scheme.
QuantizationPreset.Mixed was set as the default for ModelType.Transformer scheme.

Fixes:

(OpenVINO, ONNX, PyTorch) Aligned/added patterns between backends (SE block, MVN layer, multiple activations, etc.) to restore performance/metrics.
Fixed patterns for ModelType.Transformer to align with the quantization scheme.

Improvements:

Improved UX with the new progress bar for pipeline, new exceptions, and .dot graph visualization updates.
(OpenVINO) Optimized WeightsCompression algorithm (compress_weights(…) pipeline) execution time for LLM's quantization, added ignored scope support.
(OpenVINO) Optimized AccuracyAwareQuantization algorithm execution time with multi-threaded approach while calculating ranking score (quantize_with_accuracy_control(…) pipeline).
(OpenVINO) Added extract_ov_subgraph tool for large IR subgraph extraction.
(ONNX) Optimized quantization pipeline (up to 1.15x speed up).

Tutorials:

Known issues:

(ONNX) quantize(...) method can generate inaccurate int8 results for models with the BatchNormalization layer that contains biases. To get the best accuracy, use the do_constant_folding=True option during export from PyTorch to ONNX.

Compression-aware training:

Fixes:

(PyTorch) Fixed Hessian trace calculation to solve #2155 issue.

Requirements:

Updated PyTorch version (2.1.0).
Updated numpy version (<1.27).

Deprecations/Removals:

(PyTorch) Removed legacy external quantizer storage names.
(PyTorch) Removed torch < 2.0 version support.

Assets 2