Formula Recognition demo

image を LaTexに変換してくれるFormula RecognitionがOpenVINO 2021からはいったので、試してみます

実行環境

CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
MemTotal:       4002276 kB
OS: Ubuntu 20.04LTS
vmware上で実行

モデルのダウンロード

/opt/intel/openvino_2021/deployment_tools/tools/model_downloader/downloader.py –list models.lst -o ~/openvino_models/

最低限動作させるオプション

Options:
-h, –help Show this help message and exit.
-m_encoder M_ENCODER Required. Path to an .xml file with a trained encoder part of the model
-m_decoder M_DECODER Required. Path to an .xml file with a trained decoder part of the model
-i INPUT, –input INPUT
Required. Path to a folder with images or path to an image files
-o OUTPUT_FILE, –output_file OUTPUT_FILE
Optional. Path to file where to store output. If not mentioned, result will be storedin the console.
–vocab_path VOCAB_PATH
Required. Path to vocab file to construct meaningful phrase

この中で、vocab pathが必須になっています
今回は、
https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/formula-recognition-medium-scan-0001
からvocab.jsonをダウンロードして使用します

imageファイルはサンプルでついてくる下記のものを使用します

実行

/opt/intel/openvino_2021/deployment_tools/open_model_zoo/demos/python_demos/formula_recognition_demo/formula_recognition_demo.py -m_encoder ~/openvino_models/intel/formula-recognition-medium-scan-0001/formula-recognition-medium-scan-0001-im2latex-encoder/FP16/formula-recognition-medium-scan-0001-im2latex-encoder.xml -m_decoder ~/openvino_models/intel/formula-recognition-medium-scan-0001/formula-recognition-medium-scan-0001-im2latex-decoder/FP16/formula-recognition-medium-scan-0001-im2latex-decoder.xml –vocab_path vocab.json -i sample.png

結果

Formula: 4 7 4 W ^ { 1 } + 7 . 1 9 o ^ { 4 } – 6 – 0 . 9 6 L ^ { 1 } y

ちゃんと認識されているようです

少しオプションがわかりにくいので、調査が進み次第、第2弾に続けたいと思います

text_to_speech_demoの実行(macOS編)

はじめに

Open Model Zoo内のDemoに格納されている、text_to_speech_demo を使ってみましょう。

環境

今回はmacOSで実行してみます。(もちろん他OSでも同等です)

MacBook Pro (13-inch, 2018, Four Thunderbolt 3 Ports)
2.7 GHz クアッドコアIntel Core i7 メモリ16 GB
macOS Big Sur 11.1
Python 3.7.7
openvino 2021.2.185

モデルの確認

models.lstを開いて、使用するモデルを確認します。4つのモデルが必要です。モデル未入手の場合は、モデルダウンローダーを使って入手してください。

# This file can be used with the --list option of the model downloader.
forward-tacotron-duration-prediction
forward-tacotron-regression
wavernn-rnn
wavernn-upsampler

ヘルプの確認

% python3 text_to_speech_demo.py -h
usage: text_to_speech_demo.py [-h] -m_duration MODEL_DURATION -m_forward
                              MODEL_FORWARD -m_upsample MODEL_UPSAMPLE -m_rnn
                              MODEL_RNN -i INPUT [-o OUT]
                              [--upsampler_width UPSAMPLER_WIDTH] [-d DEVICE]

Options:
  -h, --help            Show this help message and exit.
  -m_duration MODEL_DURATION, --model_duration MODEL_DURATION
                        Required. Path to ForwardTacotron`s duration
                        prediction part (*.xml format).
  -m_forward MODEL_FORWARD, --model_forward MODEL_FORWARD
                        Required. Path to ForwardTacotron`s mel-spectrogram
                        regression part (*.xml format).
  -m_upsample MODEL_UPSAMPLE, --model_upsample MODEL_UPSAMPLE
                        Required. Path to WaveRNN`s part for mel-spectrogram
                        upsampling by time axis (*.xml format).
  -m_rnn MODEL_RNN, --model_rnn MODEL_RNN
                        Required. Path to WaveRNN`s part for waveform
                        autoregression (*.xml format).
  -i INPUT, --input INPUT
                        Text file with text.
  -o OUT, --out OUT     Required. Path to an output .wav file
  --upsampler_width UPSAMPLER_WIDTH
                        Width for reshaping of the model_upsample. If -1 then
                        no reshape. Do not use with FP16 model.
  -d DEVICE, --device DEVICE
                        Optional. Specify the target device to infer on; CPU,
                        GPU, FPGA, HDDL, MYRIAD or HETERO is acceptable. The
                        sample will look for a suitable plugin for device
                        specified. Default value is CPU

実行してみます

テキストファイルを読み込ませると、movで出力されます。まずはVincent van Gogh (ゴッホ)さんの名言を喋らせてみましょう。

Your life would be very empty if you had nothing to regret.
% python3 text_to_speech_demo.py -m_duration ./models/forward-tacotron-duration-prediction/FP16/forward-tacotron-duration-prediction.xml -m_forward ./models/forward-tacotron-regression/FP16/forward-tacotron-regression.xml -m_upsample ./models/wavernn-upsampler/FP16/wavernn-upsampler.xml -m_rnn ./models/wavernn-rnn/FP16/wavernn-rnn.xml -i test.txt -o test.wav

OK。喋りました。次は複数行の例として、Steve Jobsさんの名言を喋らせてみましょう。パラメータ等は同じなので省略します。

Your time is limited, so don't waste it living someone else's life. Don't be trapped by dogma — which is living with the results of other people's thinking. Don't let the noise of others' opinions drown out your own inner voice. And most important, have the courage to follow your heart and intuition. They somehow already know what you truly want to become. Everything else is secondary.

スムーズに喋ってますね。

colorization_demoの実行(macOS編)

はじめに

Open Model Zoo内のDemoに格納されている、colorization_demo を使ってみましょう。

環境

今回はmacOSで実行してみます。(もちろん他OSでも同等です)

MacBook Pro (13-inch, 2018, Four Thunderbolt 3 Ports)
2.7 GHz クアッドコアIntel Core i7 メモリ16 GB
macOS Big Sur 11.1
Python 3.7.7
openvino 2021.2.185

モデルの確認

models.lstを開いて、使用するモデルを確認します。colorization-v2、colorization-siggraphの2つが利用出来る事がわかります。モデル未入手の場合は、モデルダウンローダーを使って入手してください。

# This file can be used with the --list option of the model downloader.
colorization-v2
colorization-siggraph

ヘルプの確認

% python3 colorization_demo.py -h
usage: colorization_demo.py [-h] -m MODEL [-d DEVICE] -i "<path>" [--no_show]
                            [-v] [-u UTILIZATION_MONITORS]

Options:
  -h, --help            Help with the script.
  -m MODEL, --model MODEL
                        Required. Path to .xml file with pre-trained model.
  -d DEVICE, --device DEVICE
                        Optional. Specify target device for infer: CPU, GPU,
                        FPGA, HDDL or MYRIAD. Default: CPU
  -i "<path>", --input "<path>"
                        Required. Input to process.
  --no_show             Optional. Disable display of results on screen.
  -v, --verbose         Optional. Enable display of processing logs on screen.
  -u UTILIZATION_MONITORS, --utilization_monitors UTILIZATION_MONITORS
                        Optional. List of monitors to show initially.

実行してみます

チャップリンの動画を使いました。モデル毎に確認してみます。

% python3 colorization_demo.py
 -m /public/colorization-v2/FP16/colorization-v2.xml
 -i Charlie_Chaplin_Mabels_Strange_Predicament.avi 
% python3 colorization_demo.py
 -m /public/colorization-siggraph/FP16/colorization-siggraph.xml 
 -i Charlie_Chaplin_Mabels_Strange_Predicament.avi

動画でも確認してみましょう

colorization-v2

colorization-siggraph

monodepthで奥行きデータを取得2

はじめに

Open Model Zoo内のDemoに格納されている、monodepth demo(Python版) を使ってみましょう。以下の記事の関連記事となります。

環境

今回はmacOSで実行してみます。

MacBook Pro (13-inch, 2018, Four Thunderbolt 3 Ports)
2.7 GHz クアッドコアIntel Core i7 メモリ16 GB
macOS Big Sur 11.1
Python 3.7.7
openvino 2021.2.185

モデルの確認

models.lstを開いて、使用するモデルを確認します。fcrn-dp-nyu-depth-v2-tf、midasnetの2つが利用出来る事がわかります。モデル未入手の場合は、モデルダウンローダーを使って入手してください。(前回記事を参考)

# This file can be used with the --list option of the model downloader.
fcrn-dp-nyu-depth-v2-tf
midasnet

ヘルプの確認

% python3 monodepth_demo.py -h 
usage: monodepth_demo.py [-h] -m MODEL -i INPUT [-l CPU_EXTENSION] [-d DEVICE]

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Required. Path to an .xml file with a trained model
  -i INPUT, --input INPUT
                        Required. Path to a input image file
  -l CPU_EXTENSION, --cpu_extension CPU_EXTENSION
                        Optional. Required for CPU custom layers. Absolute
                        MKLDNN (CPU)-targeted custom layers. Absolute path to
                        a shared library with the kernels implementations
  -d DEVICE, --device DEVICE
                        Optional. Specify the target device to infer on; CPU,
                        GPU, FPGA, HDDL or MYRIAD is acceptable. Sample will
                        look for a suitable plugin for device specified.
                        Default value is CPU

実行してみる

入力画像は以下を使いました。Suzuki Bandit 250vです。

モデル:fcrn-dp-nyu-depth-v2-tf で実行

% python3 monodepth_demo.py -m /public/fcrn-dp-nyu-depth-v2-tf/FP16/fcrn-dp-nyu-depth-v2-tf.xml -i IMG_0958.jpg 
[ INFO ] creating inference engine
[ INFO ] Loading network
[ INFO ] preparing input blobs
[ INFO ] Image is resized from (768, 1024) to (228, 304)
[ INFO ] loading model to the plugin
[ INFO ] starting inference
[ INFO ] processing output blob
[ INFO ] Disparity map was saved to disp.pfm
[ INFO ] Color-coded disparity image was saved to disp.png
[ INFO ] This demo is an API example, for any performance measurements please use the dedicated benchmark_app tool from the openVINO toolkit

モデル:midasnet で実行

% python3 monodepth_demo.py -i IMG_0958.jpg -m /public/midasnet/FP16/midasnet.xml 
[ INFO ] creating inference engine
[ INFO ] Loading network
[ INFO ] preparing input blobs
[ INFO ] Image is resized from (768, 1024) to (384, 384)
[ INFO ] loading model to the plugin
[ INFO ] starting inference
[ INFO ] processing output blob
[ INFO ] Disparity map was saved to disp.pfm
[ INFO ] Color-coded disparity image was saved to disp.png
[ INFO ] This demo is an API example, for any performance measurements please use the dedicated benchmark_app tool from the openVINO toolkit

様々な画像で実行してみる

様々な画像に対して、fcrn-dp-nyu-depth-v2-tfmidasnetのモデルでmonodepth_demo.py を実行してみます。

太陽の塔
fcrn-dp-nyu-depth-v2-tf
midasnet
増上寺
fcrn-dp-nyu-depth-v2-tf
midasnet
こんにゃくパーク
fcrn-dp-nyu-depth-v2-tf
midasnet
富岡製糸場
fcrn-dp-nyu-depth-v2-tf
midasnet
富岡製糸場
fcrn-dp-nyu-depth-v2-tf
midasnet
富岡製糸場
fcrn-dp-nyu-depth-v2-tf
midasnet

macOS Big Surでopenvino_2021.2.185を実行してみる

はじめに

Windows 10やUbuntuでOpenVINOを確認して来ましたが、久々にmacOS Big Sur でopenvino_2021.2.185を実行してみます。

本記事はOpenVINOを動かす事を目的に試行した内容になります。お手持ちの環境で動作するヒントになれば幸いです。

使用したハードウェアは以下の通りです。

  機種名:	MacBook Pro
  機種ID:	MacBookPro15,2
  プロセッサ名:	クアッドコアIntel Core i7
  プロセッサ速度:	2.7 GHz
  プロセッサの個数:	1
  コアの総数:	4
  二次キャッシュ(コア単位):	256 KB
  三次キャッシュ:	8 MB
  ハイパー・スレッディング・テクノロジ:	有効
  メモリ:	16 GB

OpenVINOの公式ページから2021.2.185をダウンロードしてインストールしてください。本記事はセットアップの流れが終わったという前提で始めます。

ターミナル起動時setupvars.shでwarningが出る事象への対処

Last login: Thu Dec 24 18:34:05 on ttys000
[setupvars.sh] WARNING: Can not find OpenVINO Python binaries by path /Users/python
[setupvars.sh] WARNING: OpenVINO Python environment does not set properly
[setupvars.sh] OpenVINO environment initialized

ターミナルを起動します。あれ、pythonのパスが/Users/pythonになっていますね。環境変数を確認してみましょう。

~ % echo $INTEL_OPENVINO_DIR                    
/Users

$INTEL_OPENVINO_DIR が変です。Usersではありません。なんでしょう。.zshrcにexport INTEL_OPENVINO_DIR=”/opt/intel/openvino_2021/” を追記してみましたが、変わらずでした。(setupvars.sh内で設定しているからでした。)

という訳で、setupvars.sh内で直接設定してみます。

再度ターミナルを開くと、メッセージが変わりました。

Last login: Fri Dec 25 10:12:12 on ttys000
[setupvars.sh] WARNING: Can not find OpenVINO Python module for python3.9 by path /opt/intel/openvino_2021/python/python3.9
[setupvars.sh] WARNING: OpenVINO Python environment does not set properly
[setupvars.sh] OpenVINO environment initialized

ん?3.9?Python 3.9はありませんよ。

setupvars.shを編集して、python_versionを3.7に設定します。

ターミナルを再度起動してみます。無事warningが消えました。

human_pose_estimation_demoを実行してみる

setupvar.shが動作したみたいなので、pythonデモを実行してみますが、import cv2でエラーになりました(画面撮り忘れ)

rpathが解決出来ていないようだったので、install_name_toolで追加します。

% sudo install_name_tool -add_rpath /opt/intel/openvino_2021.2.185/opencv/lib/ /opt/intel/openvino_2021/python/python3/cv2.so 
% sudo install_name_tool -add_rpath /opt/intel/openvino_2021.2.185/deployment_tools/inference_engine/lib/intel64/ /opt/intel/openvino_2021/python/python3/cv2.so  

上記を実行すると、import cv2をしてもエラーになりませんでした。

% python3
Python 3.7.7 (default, Nov 12 2020, 17:58:53) 
[Clang 12.0.0 (clang-1200.0.32.21)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> 

human_pose_estimation.pyの実行

human_pose_estimation.pyを実行してみます。

% python3 human_pose_estimation.py -h
Traceback (most recent call last):
  File "human_pose_estimation.py", line 30, in <module>
    from human_pose_estimation_demo.model import HPEAssociativeEmbedding, HPEOpenPose
  File "/Users//omz_demos_build/python_demos/human_pose_estimation_demo/human_pose_estimation_demo/model.py", line 23, in <module>
    import ngraph as ng
  File "/opt/intel/openvino_2021/python/python3.7/ngraph/__init__.py", line 26, in <module>
    from ngraph.impl import Node
  File "/opt/intel/openvino_2021/python/python3.7/ngraph/impl/__init__.py", line 35, in <module>
    from _pyngraph import Dimension
ImportError: dlopen(/opt/intel/openvino_2021/python/python3.7/_pyngraph.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libonnx_importer.dylib
  Referenced from: /opt/intel/openvino_2021/python/python3.7/_pyngraph.cpython-37m-darwin.so
  Reason: image not found

まだ出ますね。install_name_tool を使って、rpathで足りないものを追加します。

% sudo install_name_tool -add_rpath /opt/intel/openvino_2021.2.185/deployment_tools/ngraph/lib /opt/intel/openvino_2021/python/python3.7/_pyngraph.cpython-37m-darwin.so

OK。-hが通るようになりました。

human_pose_estimation_demo % python3 human_pose_estimation.py -h                                                                                                                                   
usage: human_pose_estimation.py [-h] -i INPUT -m MODEL -at {ae,openpose}
                                [--tsize TSIZE] [-t PROB_THRESHOLD] [-r]
                                [-d DEVICE] [-nireq NUM_INFER_REQUESTS]
                                [-nstreams NUM_STREAMS]
                                [-nthreads NUM_THREADS] [-loop LOOP]
                                [-no_show] [-u UTILIZATION_MONITORS]

Options:
  -h, --help            Show this help message and exit.
  -i INPUT, --input INPUT
                        Required. Path to an image, video file or a numeric
                        camera ID.
  -m MODEL, --model MODEL
                        Required. Path to an .xml file with a trained model.
  -at {ae,openpose}, --architecture_type {ae,openpose}
                        Required. Type of the network, either "ae" for
                        Associative Embedding or "openpose" for OpenPose.
  --tsize TSIZE         Optional. Target input size. This demo implements
                        image pre-processing pipeline that is common to human
                        pose estimation approaches. Image is resize first to
                        some target size and then the network is reshaped to
                        fit the input image shape. By default target image
                        size is determined based on the input shape from IR.
                        Alternatively it can be manually set via this
                        parameter. Note that for OpenPose-like nets image is
                        resized to a predefined height, which is the target
                        size in this case. For Associative Embedding-like nets
                        target size is the length of a short image side.
  -t PROB_THRESHOLD, --prob_threshold PROB_THRESHOLD
                        Optional. Probability threshold for poses filtering.
  -r, --raw_output_message
                        Optional. Output inference results raw values showing.
  -d DEVICE, --device DEVICE
                        Optional. Specify the target device to infer on; CPU,
                        GPU, FPGA, HDDL or MYRIAD is acceptable. The sample
                        will look for a suitable plugin for device specified.
                        Default value is CPU.
  -nireq NUM_INFER_REQUESTS, --num_infer_requests NUM_INFER_REQUESTS
                        Optional. Number of infer requests
  -nstreams NUM_STREAMS, --num_streams NUM_STREAMS
                        Optional. Number of streams to use for inference on
                        the CPU or/and GPU in throughput mode (for HETERO and
                        MULTI device cases use format
                        <device1>:<nstreams1>,<device2>:<nstreams2> or just
                        <nstreams>)
  -nthreads NUM_THREADS, --num_threads NUM_THREADS
                        Optional. Number of threads to use for inference on
                        CPU (including HETERO cases)
  -loop LOOP, --loop LOOP
                        Optional. Number of times to repeat the input.
  -no_show, --no_show   Optional. Don't show output
  -u UTILIZATION_MONITORS, --utilization_monitors UTILIZATION_MONITORS
                        Optional. List of monitors to show initially.

デモを実行してみます。

% python3 human_pose_estimation.py -i walking.mov -m ./human-pose-estimation-0001/FP16/human-pose-estimation-0001.xml -at openpose
[ INFO ] Initializing Inference Engine...
[ INFO ] Loading network...
[ INFO ] Using USER_SPECIFIED mode
[ INFO ] Reading network from IR...
[ INFO ] Loading network to plugin...
[ INFO ] Reading network from IR...
[ INFO ] Loading network to plugin...
[ INFO ] Starting inference...
To close the application, press 'CTRL+C' here or switch to the output window and press ESC key
To switch between min_latency/user_specified modes, press TAB key in the output window
[ INFO ] 
[ INFO ] Mode: USER_SPECIFIED
[ INFO ] FPS: 10.7
[ INFO ] Latency: 64.1 ms

 無事実行できました!

まとめ

今回のポイントは以下の2点です。ご参考になれば幸いです。

  • setupvar.shを正常に実行するようにする。
  • rpathの問題を解消する。

OpenVINO 2021.2 Release

OpenVINO 2021.2がリリースされました
https://software.intel.com/en-us/openvino-toolkit
https://software.intel.com/content/www/us/en/develop/articles/openvino-relnotes.html

Intelからのリリースを転記します

Executive Summary

  • Integrates the Deep Learning Workbench with the Intel® DevCloud for the Edge as a Beta release. Graphically analyze models using the Deep Learning Workbench on the Intel® DevCloud for the Edge (instead of a local machine only) to compare, visualize and fine-tune a solution against multiple remote hardware configurations.
  • Introduces support for Red Hat Enterprise Linux (RHEL) 8.2. See System Requirements for more info.
  • Introduces per-channel quantization support in the Model Optimizer for models quantized with TensorFlow Quantization-Aware Training containing per-channel quantization for weights, which improves performance by model compression and latency reduction.
  • Pre-trained models and support for public models to streamline development:Public Models: Yolov4 (for object detection), AISpeech (for speech recognition), and DeepLabv3 (for semantic segmentation)
  • Pre-trained Models: Human Pose Estimation (update), Formula Recognition Polynomial Handwritten (new), Machine Translation (update), Common Sign Language Recognition (New), and Text-to-Speech (new)
  • New OpenVINO™ Security Add-on, which controls access to model(s) through secure packaging and execution. Based on KVM Virtual machines and Docker* containers and compatible with the OpenVINO™ Model Server, this new add-on enables packaging for flexible deployment and controlled model access.
  • PyPI project moved from openvino-python to openvino, and 2021.1 version to be removed in the default view. The specific version is still available for users depending on this exact version by using openvino-python==2021.1

OpenVINO.jpでは引き続きベンチマークなどを行っていきたいと思います