Skip to content

v2.7.0

Compare
Choose a tag to compare
@KodiaqQ KodiaqQ released this 16 Nov 14:59
· 2168 commits to develop since this release

Post-training Quantization:

Features:

  • (OpenVINO) Added support for data-free 4-bit weights compression through NF4 and INT4 data types (compress_weights(…) pipeline).
  • (OpenVINO) Added support for IF operation quantization.
  • (OpenVINO) Added dump_intermediate_model parameter support for AccuracyAwareAlgorithm (quantize_with_accuracy_control(…) pipeline).
  • (OpenVINO) Added support for SmoothQuant and ChannelAlignment algorithms for HyperparameterTuner algorithm (quantize_with_tune_hyperparams(…) pipeline).
  • (PyTorch) Post-training Quantization is now supported with quantize(…) pipeline and the common implementation of quantization algorithms. Deprecated create_compressed_model() method for Post-training Quantization.
  • Added new types (AvgPool, GroupNorm, LayerNorm) to the ignored scope for ModelType.Transformer scheme.
  • QuantizationPreset.Mixed was set as the default for ModelType.Transformer scheme.

Fixes:

  • (OpenVINO, ONNX, PyTorch) Aligned/added patterns between backends (SE block, MVN layer, multiple activations, etc.) to restore performance/metrics.
  • Fixed patterns for ModelType.Transformer to align with the quantization scheme.

Improvements:

  • Improved UX with the new progress bar for pipeline, new exceptions, and .dot graph visualization updates.
  • (OpenVINO) Optimized WeightsCompression algorithm (compress_weights(…) pipeline) execution time for LLM's quantization, added ignored scope support.
  • (OpenVINO) Optimized AccuracyAwareQuantization algorithm execution time with multi-threaded approach while calculating ranking score (quantize_with_accuracy_control(…) pipeline).
  • (OpenVINO) Added extract_ov_subgraph tool for large IR subgraph extraction.
  • (ONNX) Optimized quantization pipeline (up to 1.15x speed up).

Tutorials:

Known issues:

  • (ONNX) quantize(...) method can generate inaccurate int8 results for models with the BatchNormalization layer that contains biases. To get the best accuracy, use the do_constant_folding=True option during export from PyTorch to ONNX.

Compression-aware training:

Fixes:

  • (PyTorch) Fixed Hessian trace calculation to solve #2155 issue.

Requirements:

  • Updated PyTorch version (2.1.0).
  • Updated numpy version (<1.27).

Deprecations/Removals:

  • (PyTorch) Removed legacy external quantizer storage names.
  • (PyTorch) Removed torch < 2.0 version support.