Formula Recognition demo

image を LaTexに変換してくれるFormula RecognitionがOpenVINO 2021からはいったので、試してみます

実行環境

CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
MemTotal:       4002276 kB
OS: Ubuntu 20.04LTS
vmware上で実行

モデルのダウンロード

/opt/intel/openvino_2021/deployment_tools/tools/model_downloader/downloader.py –list models.lst -o ~/openvino_models/

最低限動作させるオプション

Options:
-h, –help Show this help message and exit.
-m_encoder M_ENCODER Required. Path to an .xml file with a trained encoder part of the model
-m_decoder M_DECODER Required. Path to an .xml file with a trained decoder part of the model
-i INPUT, –input INPUT
Required. Path to a folder with images or path to an image files
-o OUTPUT_FILE, –output_file OUTPUT_FILE
Optional. Path to file where to store output. If not mentioned, result will be storedin the console.
–vocab_path VOCAB_PATH
Required. Path to vocab file to construct meaningful phrase

この中で、vocab pathが必須になっています
今回は、
https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/formula-recognition-medium-scan-0001
からvocab.jsonをダウンロードして使用します

imageファイルはサンプルでついてくる下記のものを使用します

実行

/opt/intel/openvino_2021/deployment_tools/open_model_zoo/demos/python_demos/formula_recognition_demo/formula_recognition_demo.py -m_encoder ~/openvino_models/intel/formula-recognition-medium-scan-0001/formula-recognition-medium-scan-0001-im2latex-encoder/FP16/formula-recognition-medium-scan-0001-im2latex-encoder.xml -m_decoder ~/openvino_models/intel/formula-recognition-medium-scan-0001/formula-recognition-medium-scan-0001-im2latex-decoder/FP16/formula-recognition-medium-scan-0001-im2latex-decoder.xml –vocab_path vocab.json -i sample.png

結果

Formula: 4 7 4 W ^ { 1 } + 7 . 1 9 o ^ { 4 } – 6 – 0 . 9 6 L ^ { 1 } y

ちゃんと認識されているようです

少しオプションがわかりにくいので、調査が進み次第、第2弾に続けたいと思います

Open Model Zoo(2021.1)

Open Model ZooはOpenVINOを知るためのデモアプリケーションです

OpenVINO 2021.1でインストールされているのは、
/opt/intel/openvino_2021/deployment_tools/open_model_zoo/demos
で、複数のアプリケーションが格納されています
格納されているデモはOpenVINO 2020.xから追加や削除されているものもあります

デモ名称 場所 内容
Classification C++ Demo ./classification_demo/ Shows an example of using neural networks for image classification.
Crossroad Camera C++ Demo ./crossroad_camera_demo/ Person Detection followed by the Person Attributes Recognition and Person Reidentification Retail, supports images/video and camera inputs.
Gaze Estimation C++ Demo ./gaze_estimation_demo/ Face detection followed by gaze estimation, head pose estimation and facial landmarks regression.
Human Pose Estimation C++ Demo ./human_pose_estimation_demo/ Human pose estimation demo.
Interactive Face Detection C++ Demo ./interactive_face_detection_demo/ Face Detection coupled with Age/Gender, Head-Pose, Emotion, and Facial Landmarks detectors. Supports video and camera inputs.
Mask R-CNN C++ Demo for TensorFlow* Object Detection API ./mask_rcnn_demo/ Inference of instance segmentation networks created with TensorFlow\* Object Detection API.
Multi-Channel C++ Demos ./multi_channel/ Several demo applications for multi-channel scenarios.
Object Detection for Faster R-CNN C++ Demo ./object_detection_demo_faster_rcnn/ Inference of object detection networks like Faster R-CNN (the demo supports only images as inputs).
Object Detection for SSD C++ Demo ./object_detection_demo_ssd_async/ Demo application for SSD-based Object Detection networks, new Async API performance showcase, and simple OpenCV interoperability (supports video and camera inputs).
Object Detection for YOLO V3 C++ Demo ./object_detection_demo_yolov3_async/ Demo application for YOLOV3-based Object Detection networks, new Async API performance showcase, and simple OpenCV interoperability (supports video and camera inputs).
Pedestrian Tracker C++ Demo ./pedestrian_tracker_demo/ Demo application for pedestrian tracking scenario.
Security Barrier Camera C++ Demo ./security_barrier_camera_demo/ Vehicle Detection followed by the Vehicle Attributes and License-Plate Recognition, supports images/video and camera inputs.
Image Segmentation C++ Demo ./segmentation_demo/ Inference of image segmentation networks like FCN8 (the demo supports only images as inputs).
Smart Classroom C++ Demo ./smart_classroom_demo/ Face recognition and action detection demo for classroom environment.
Super Resolution C++ Demo ./super_resolution_demo/ Super Resolution demo (the demo supports only images as inputs). It enhances the resolution of the input image.
Text Detection C++ Demo ./text_detection_demo/ Text Detection demo. It detects and recognizes multi-oriented scene text on an input image and puts a bounding box around detected area.
Action Recognition Python* Demo ./python_demos/action_recognition/ Demo application for Action Recognition algorithm, which classifies actions that are being performed on input video.
BERT Question Answering Python* Demo ./python_demos/bert_question_answering_demo/
BERT Question Answering Embedding Python* Demo ./python_demos/bert_question_answering_embedding_demo/ The demo demonstrates how to run BERT based models for question answering task.
Colorization Python* Demo ./python_demos/colorization_demo/ Colorization demo colorizes input frames.
Formula Recognition Python* Demo ./python_demos/formula_recognition_demo/ The demo demonstrates how to run Im2latex formula recognition models and recognize latex formulas.
Handwritten Japanese Recognition Python* Demo ./python_demos/handwritten_japanese_recognition_demo/ The demo demonstrates how to run Handwritten Japanese Recognition models.
3D Human Pose Estimation Python* Demo ./python_demos/human_pose_estimation_3d_demo/ 3D human pose estimation demo.
Image Inpainting Python Demo ./python_demos/image_inpainting_demo/ Demo application for GMCNN inpainting network.
Image Retrieval Python* Demo ./python_demos/image_retrieval_demo/ The demo demonstrates how to run Image Retrieval models using OpenVINO™.
Instance Segmentation Python* Demo ./python_demos/instance_segmentation_demo/ Inference of instance segmentation networks trained in `Detectron` or `maskrcnn-benchmark`.
Machine Translation Python* Demo ./python_demos/machine_translation_demo/ The demo demonstrates how to run non-autoregressive machine translation models.
Monodepth Python* Demo ./python_demos/monodepth_demo/ The demo demonstrates how to run monocular depth estimation models.
Multi-Camera Multi-Target Tracking Python* Demo ./python_demos/multi_camera_multi_target_tracking/ Demo application for multiple targets (persons or vehicles) tracking on multiple cameras.
Object Detection for CenterNet Python* Demo ./python_demos/object_detection_demo_centernet/ Demo application for CenterNet object detection network.
Object Detection for RetinaFace Python\* Demo ./python_demos/object_detection_demo_retinaface/ Demo application for RetinaFace face detection model.
Single Human Pose Estimation Python* Demo ./python_demos/single_human_pose_estimation_demo/ 2D human pose estimation demo.
Sound Classification Python* Demo ./python_demos/sound_classification_demo/ Demo application for sound classification algorithm.
Speech Recognition Python* Demo ./python_demos/speech_recognition_demo/ Speech recognition demo: takes audio file with an English phrase on input, and converts it into text.
Text Spotting Python* Demo ./python_demos/text_spotting_demo/ The demo demonstrates how to run Text Spotting models.

C++サンプルのビルド

/opt/intel/openvino_2021/deployment_tools/open_model_zoo/demos
ディレクトリにあるbuild_demos.shでビルドできます
このコマンドを実行すると、
ホームディレクトリ/omz_demos_build が作成され、その中の、
omz_demos_build/intel64/Release
に実行ファイルがビルドされます

monodepthで奥行きデータを取得

Open Model Zoo内のDemoに格納されている
monodepth demo(Python版) を使ってみましょう

cd /opt/intel/openvino_2021/deployment_tools/open_model_zoo/demos/python_demos/monodepth_demo/

このフォルダ内にあるmodels.lstでモデルをダウンロードします

python3 /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/downloader.py –list models.lst -o ~/openvino_models/

ローカルフォルダにダウンロードされますが、このモデルはPytorchのモデルなので、変換が必要です
下記コマンドで変換してください

python3 /opt/intel/openvino_2021/deployment_tools/tools/model_downloader/converter.py –name midasnet -d ~/openvino_models/ –precisions=FP16

何事もなく変換が完了すると思いますので、次に適当な画像ファイルを用意して、実行してみます
今回はアイキャッチ画像を入力に使用します

python3 /opt/intel/openvino_2021/deployment_tools/open_model_zoo/demos/python_demos/monodepth_demo/monodepth_demo.py -m ~/openvino_models/public/midasnet/FP16/midasnet.xml -i DSCN0958_s.jpeg

奥行き画像が取得できているようです
実際にはfloating disparity map (PFM)というPFMファイルが得られ、こちらの画像は同時に得られるPNGとなります