OpenVINO 2024.1 Release

OpenVINO 2024.1 がリリースされています
https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/whats-new.html

The OpenVINO™ toolkit version 2024.1 release enhances generative AI accessibility with improved large language model (LLM) performance and expanded model coverage. It also boosts portability and performance for deployment anywhere: at the edge, in the cloud, or locally.

ということで、本家のリリースノートはこちらです。
日本語ページもあります

Stable Diffusion 1.5, ChatGLM3-6B, and Qwen-7B models optimized for improved inference speed on Intel® Core™ Ultra processors with integrated GPU.

とのことですね。Stable Diffusionのローカルでの作成は試したいところです。

下記はリリースノートのコピペです。

What’s new

More Gen AI coverage and framework integrations to minimize code changes.

Mixtral and URLNet models optimized for performance improvements on Intel® Xeon® processors.
Stable Diffusion 1.5, ChatGLM3-6B, and Qwen-7B models optimized for improved inference speed on Intel® Core™ Ultra processors with integrated GPU.
Support for Falcon-7B-Instruct, a GenAI Large Language Model (LLM) ready-to-use chat/instruct model with superior performance metrics.
New Jupyter Notebooks added: YOLO V9, YOLO V8 Oriented Bounding Boxes Detection (OOB), Stable Diffusion in Keras, MobileCLIP, RMBG-v1.4 Background Removal, Magika, TripoSR, AnimateAnyone, LLaVA-Next, and RAG system with OpenVINO and LangChain.

Broader LLM model support and more model compression techniques.

LLM compilation time reduced through additional optimizations with compressed embedding. Improved 1st token performance of LLMs on 4th and 5th generations of Intel® Xeon® processors with Intel® Advanced Matrix Extensions (Intel® AMX).
Better LLM compression and improved performance with oneDNN, INT4, and INT8 support for Intel® Arc™ GPUs.
Significant memory reduction for select smaller GenAI models on Intel® Core™ Ultra processors with integrated GPU.

More portability and performance to run AI at the edge, in the cloud, or locally.

The preview NPU plugin for Intel® Core™ Ultra processors is now available in the OpenVINO open-source GitHub repository, in addition to the main OpenVINO package on PyPI.
The JavaScript API is now more easily accessible through the npm repository, enabling JavaScript developers’ seamless access to the OpenVINO API.
FP16 inference on ARM processors now enabled for the Convolutional Neural Network (CNN) by default.

OpenVINO™ Runtime

Common

Unicode file paths for cached models are now supported on Windows.
Pad pre-processing API to extend input tensor on edges with constants.
A fix for inference failures of certain image generation models has been implemented (fused I/O port names after transformation).
Compiler’s warnings-as-errors option is now on, improving the coding criteria and quality. Build warnings will not be allowed for new OpenVINO code and the existing warnings have been fixed.

AUTO Inference Mode

Returning the ov::enable_profiling value from ov::CompiledModel is now supported.

CPU Device Plugin

1st token performance of LLMs has been improved on the 4th and 5th generations of Intel® Xeon® processors with Intel® Advanced Matrix Extensions (Intel® AMX).
LLM compilation time and memory footprint have been improved through additional optimizations with compressed embeddings.
Performance of MoE (e.g. Mixtral), Gemma, and GPT-J has been improved further.
Performance has been improved significantly for a wide set of models on ARM devices.
FP16 inference precision is now the default for all types of models on ARM devices.
CPU architecture-agnostic build has been implemented, to enable unified binary distribution on different ARM devices.

GPU Device Plugin

LLM first token latency has been improved on both integrated and discrete GPU platforms.
For the ChatGLM3-6B model, average token latency has been improved on integrated GPU platforms.
For Stable Diffusion 1.5 FP16 precision, performance has been improved on Intel® Core™ Ultra processors.

NPU Device Plugin

NPU Plugin is now part of the OpenVINO GitHub repository. All the most recent plugin changes will be immediately available in the repo. Note that NPU is part of Intel® Core™ Ultra processors.
New OpenVINO™ notebook “Hello, NPU!” introducing NPU usage with OpenVINO has been added.
Version 22H2 or later is required for Microsoft Windows® 11 64-bit to run inference on NPU.

OpenVINO Python API

GIL-free creation of RemoteTensors is now used – holding GIL means that the process is not suited for multithreading and removing the GIL lock will increase performance which is critical for the concept of Remote Tensors.
Packed data type BF16 on the Python API level has been added, opening a new way of supporting data types not handled by numpy.
‘pad’ operator support for ov::preprocess::PrePostProcessorItem has been added.
ov.PartialShape.dynamic(int) definition has been provided.

OpenVINO C API

Two new pre-processing APIs for scale and mean have been added.

OpenVINO Node.js API

New methods to align JavaScript API with CPP API have been added, such as CompiledModel.exportModel(), core.import_model(), Core set/get property and Tensor.get_size(), and Model.is_dynamic().
Documentation has been extended to help developers start integrating JavaScript applications with OpenVINO™.

TensorFlow Framework Support

tf.keras.layers.TextVectorization tokenizer is now supported.
Conversion of models with Variable and HashTable (dictionary) resources has been improved.
8 NEW operations have been added (see the list here, marked as NEW).
10 operations have received complex tensor support.
Input tensor names for TF1 models have been adjusted to have a single name per input.
Hugging Face model support coverage has increased significantly, due to:
- extraction of input signature of a model in memory has been fixed,
- reading of variable values for a model in memory has been fixed.

PyTorch Framework Support

ModuleExtension, a new type of extension for PyTorch models is now supported (PR #23536).
22 NEW operations have been added.
Experimental support for models produced by torch.export (FX graph) has been added (PR #23815).

OpenVINO Model Server

OpenVINO™ Runtime backend used is now 2024.1
OpenVINO™ models with String data type on output are supported. Now, OpenVINO™ Model Server can support models with input and output of the String type, so developers can take advantage of the tokenization built into the model as the first layer. Developers can also rely on any postprocessing embedded into the model which returns text only. Check the demo on string input data with the universal-sentence-encoder model and the String output model demo.
MediaPipe Python calculators have been updated to support relative paths for all related configuration and Python code files. Now, the complete graph configuration folder can be deployed in an arbitrary path without any code changes.
KServe REST API support has been extended to properly handle the string format in JSON body, just like the binary format compatible with NVIDIA Triton™.
A demo showcasing a full RAG algorithm fully delegated to the model server has been added.

Neural Network Compression Framework

Model subgraphs can now be defined in the ignored scope for INT8 Post-training Quantization, nncf.quantize(), which simplifies excluding accuracy-sensitive layers from quantization.
A batch size of more than 1 is now partially supported for INT8 Post-training Quantization, speeding up the process. Note that it is not recommended for transformer-based models as it may impact accuracy. Here is an example demo.
Now it is possible to apply fine-tuning on INT8 models after Post-training Quantization to improve model accuracy and make it easier to move from post-training to training-aware quantization. Here is an example demo.

OpenVINO Tokenizers

TensorFlow support has been extended – TextVectorization layer translation:
- Aligned existing ops with TF ops and added a translator for them.
- Added new ragged tensor ops and string ops.
A new tokenizer type, RWKV is now supported:
- Added Trie tokenizer and Fuse op for ragged tensors.
- A new way to get OV Tokenizers: build a vocab from file.
Tokenizer caching has been redesigned to work with the OpenVINO™ model caching mechanism.

Other Changes and Known Issues

Jupyter Notebooks

The default branch for the OpenVINO™ Notebooks repository has been changed from ‘main’ to ‘latest’. The ‘main’ branch of the notebooks repository is now deprecated and will be maintained until September 30, 2024.

The new branch, ‘latest’, offers a better user experience and simplifies maintenance due to significant refactoring and an improved directory naming structure.

Use the local README.md file and OpenVINO™ Notebooks at GitHub Pages to navigate through the content.

The following notebooks have been updated or newly added:

Known Issues

Component – CPU Plugin

ID – N/A

Description:

Default CPU pinning policy on Windows has been changed to follow Windows’ policy instead of controlling the CPU pinning in the OpenVINO plugin. This brings certain dynamic or performance variance on Windows. Developers can use ov::hint::enable_cpu_pinning to enable or disable CPU pinning explicitly.

Component – Hardware Configuration

ID – N/A

Description:

Reduced performance for LLMs may be observed on newer CPUs. To mitigate, modify the default settings in BIOS to change the system into 2 NUMA node system:

1. Enter the BIOS configuration menu.
2. Select EDKII Menu -> Socket Configuration -> Uncore Configuration -> Uncore General Configuration -> SNC.
3. The SNC setting is set to AUTO by default. Change the SNC setting to disabled to configure one NUMA node per processor socket upon boot.
4. After system reboot, confirm the NUMA node setting using: numatcl -H. Expect to see only nodes 0 and 1 on a
2-socket system with the following mapping:

Node – 0 – 1
0 – 10 – 21
1 – 21 – 10

Deprecation And Support

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.

Discontinued in 2024

Runtime components:
- Intel® Gaussian & Neural Accelerator (Intel® GNA). Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
- All ONNX Frontend legacy API (known as ONNX_IMPORTER_API).
- PerfomanceMode.UNDEFINED property as part of the OpenVINO Python API.
Tools:
- Deployment Manager. See installation and deployment guides for current distribution options.
- Accuracy Checker.
- Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
- A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
- Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.

Deprecated and to be removed in the future

The OpenVINO™ Development Tools package (pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.
Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
OpenVINO Model Server components:
- “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
The following notebooks have been deprecated and will be removed. For an up-to-date listing of available notebooks, refer to OpenVINO™ Notebook index (openvinotoolkit.github.io).
- Handwritten OCR with OpenVINO™
  - See alternative: Optical Character Recognition (OCR) with OpenVINO™,
  - See alternative: PaddleOCR with OpenVINO™,
  - See alternative: Handwritten Text Recognition Demo
- Image In-painting with OpenVINO™
  - See alternative: Image Inpainting Python Demo
- Interactive Machine Translation with OpenVINO
  - See alternative: Machine Translation Python* Demo
- Open Model Zoo Tools Tutorial
  - No alternatives, demonstrates deprecated tools.
- Super Resolution with OpenVINO™
  - See alternative: Super Resolution with PaddleGAN and OpenVINO
  - See alternative: Image Processing C++ Demo
- Image Colorization with OpenVINO Tutorial
- Interactive Question Answering with OpenVINO™
  - See alternative: BERT Question Answering Embedding Python* Demo
  - See alternative: BERT Question Answering Python* Demo
- Vehicle Detection And Recognition with OpenVINO™
  - See alternative: Security Barrier Camera C++ Demo
- The attention center model with OpenVINO™
- Image Generation with DeciDiffusion
- Image generation with DeepFloyd IF and OpenVINO™
- Depth estimation using VI-depth with OpenVINO™
- Instruction following using Databricks Dolly 2.0 and OpenVINO™
  - See alternative: LLM Instruction-following pipeline with OpenVINO
- Image generation with FastComposer and OpenVINO™
- Video Subtitle Generation with OpenAI Whisper
  - See alternative: Automatic speech recognition using Distil-Whisper and OpenVINO
- Introduction to Performance Tricks in OpenVINO™
- Speaker Diarization with OpenVINO™
- Subject-driven image generation and editing using BLIP Diffusion and OpenVINO
- Text Prediction with OpenVINO™
- Training to Deployment with TensorFlow and OpenVINO™
- Speech to Text with OpenVINO™
- Convert and Optimize YOLOv7 with OpenVINO™
- Quantize Data2Vec Speech Recognition Model using NNCF PTQ API
  - See alternative: Quantize Speech Recognition Models with accuracy control using NNCF PTQ API
- Semantic segmentation with LRASPP MobileNet v3 and OpenVINO
- Video Recognition using SlowFast and OpenVINO™
  - See alternative: Live Action Recognition with OpenVINO™
- Semantic Segmentation with OpenVINO™ using Segmenter
- Programming Language Classification with OpenVINO
- Stable Diffusion Text-to-Image Demo
  - See alternative: Stable Diffusion v2.1 using Optimum-Intel OpenVINO and multiple Intel Hardware
- Text-to-Image Generation with Stable Diffusion v2 and OpenVINO™
  - See alternative: Stable Diffusion v2.1 using Optimum-Intel OpenVINO and multiple Intel Hardware
- Image generation with Segmind Stable Diffusion 1B (SSD-1B) model and OpenVINO
- Data Preparation for 2D Medical Imaging
- Train a Kidney Segmentation Model with MONAI and PyTorch Lightning
- Live Inference and Benchmark CT-scan Data with OpenVINO™
  - See alternative: Quantize a Segmentation Model and Show Live Inference
- Live Style Transfer with OpenVINO™

Legal Information

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein.

You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.

No computer system can be absolutely secure.

Intel, Atom, Arria, Core, Movidius, Xeon, OpenVINO, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

Other names and brands may be claimed as the property of others.

For more complete information about compiler optimizations, see our Optimization Notice.

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Satoshi Masuda

産業用画像処理装置開発、
ゲームコンソール開発、半導体エンジニアなどを経て、
Webエンジニア＆マーケティングをやっています
好きな分野はハードウェアとソフトウェアの境界くらい