Skip to content

Neuron SDK Release - December 20, 2024

Latest
Compare
Choose a tag to compare
@awsjoshir awsjoshir released this 21 Dec 07:34
· 2 commits to master since this release

Overview: Neuron 2.21.0 introduces support for AWS Trainium 2 and Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer. The release adds new capabilities in both training and inference of large-scale models. It introduces NxD Inference (beta), a PyTorch-based library for deployment, Neuron Profiler 2.0 (beta), and PyTorch 2.5 support across the Neuron SDK, and Logical NeuronCore Configuration (LNC) for optimizing NeuronCore allocation. The release enables Llama 3.1 405B model inference on a single trn2.48xlarge instance.

NxD Inference: NxD Inference (beta) is a new PyTorch-based inference library for deploying large-scale models on AWS Inferentia and Trainium instances. It enables PyTorch model onboarding with minimal code changes and integrates with vLLM. NxDI supports various model architectures, including Llama versions for text processing (Llama 2, Llama 3, Llama 3.1, Llama 3.2, and Llama 3.3), Llama 3.2 multimodal for multimodal tasks, and Mixture-of-Experts (MoE) model architectures including Mixtral and DBRX. The library supports quantization methods, includes dynamic sampling, and is compatible with HuggingFace checkpoints and generate() API. NxDI also supports distributed strategies including tensor parallelism and incorporates speculative decoding techniques (Draft model and EAGLE). The release includes a Llama 3.1 405B model sample on a single trn2.48xlarge instance Llama 3.1 405B model inference.

For more information, see NxD Inference documentation and check the NxD Inference Github repository: aws-neuron/neuronx-distributed-inference

Transformers NeuronX (TNx): This release introduces several new features, including flash decoding support for speculative decoding, and on-device generation in speculative decoding flows. It adds Eagle speculative decoding with greedy and lossless sampling, as well as support for CPU compilation and sharded model saving. Performance improvements include optimized MLP and QKV for Llama models with sequence parallel norm and control over concurrent compilation workers.

Training Highlights: NxD Training in this release adds support for HuggingFace Llama3/3.1 70B on trn2 instances, introduces DPO support for post-training model alignment, and adds support for Mixture-of-Experts (MoE) models including Mixtral 7B. The release includes improved checkpoint conversion capabilities and supports MoE with Tensor, Sequence, Pipeline, and Expert parallelism.

ML Frameworks: Neuron 2.21.0 adds PyTorch 2.5 coming with improved support for eager mode, FP8, and Automatic Mixed Precision capabilities. JAX support extends to version 0.4.35, including support for JAX caching APIs.

Logical NeuronCore Configuration (LNC): This release introduces LNC for Trainium2 instances, optimizing NeuronCore allocation for ML applications. LNC offers two configurations: default (LNC=2) combining two physical cores, and alternative (LNC=1) mapping each physical core individually. This feature allows users to efficiently manage resources for large-scale model training and deployment through runtime variables and compiler flags.

Neuron Profiler 2.0: The new profiler provides system and device-level profiling, timeline annotations, container integration, and support for distributed workloads. It includes trace export capabilities for Perfetto visualization and integration with JAX and PyTorch profilers, and support for Logical NeuronCore Configuration (LNC).

Neuron Kernel Interface (NKI): NKI now supports Trainium2 including Logical NeuronCore Configuration (LNC), adds SPMD capabilities for multi-core operations, and includes new modules and APIs including support for float8_e5m2 datatype.

Deep Learning Containers (DLAMIs): This release expands support for JAX 0.4 within the Multi Framework DLAMI. It also introduces NeuronX Distributed Training (NxDT), Inference (NxDI), and Core (NxD) with PyTorch 2.5 support. Additionally, a new Single Framework DLAMI for TensorFlow 2.10 on Ubuntu 22 is now available.

Deep Learning Containers (DLCs): This release introduces new DLCs for JAX 0.4 training and PyTorch 2.5.1 inference and training. All DLCs have been updated to Ubuntu 22, and the pytorch-inference-neuronx DLC now supports both NxD Inference and TNx libraries.

Documentation: Documentation updates include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.

Software Maintenance: This release includes the following announcements:

  • Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release
  • Announcing end of support for Neuron DET tool starting next release
  • PyTorch Neuron versions 1.9 and 1.10 no longer supported
  • Announcing end of support for PyTorch 2.1 for Trn1, Trn2 and Inf2 starting next release
  • Announcing end of support for PyTorch 1.13 for Trn1 and Inf2 starting next release
  • Announcing end of support for Python 3.8 in future releases
  • Announcing end of support for Ubuntu20 DLCs and DLAMIs

Amazon Q: Use Q Developer as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.