文字認識で遊ぶ Text Detection C++ Demo

Text Detection C++ Demoを使って文字認識

Open Model Zoo内のDemoに格納されている
Text Detection C++ Demo を使って文字認識の実験をしてみましょう

実行環境

CPU: Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz
MemTotal: 16318440 kB
OS: Ubuntu 16.04LTS

モデルのダウンロード

 text-detection-0003, which is a detection network for finding text.
 text-detection-0004, which is a lightweight detection network for finding text.
 text-recognition-0012, which is a recognition network for recognizing text.
 handwritten-score-recognition-0001, which is a recognition network for recognizing handwritten score marks like <digit> or <digit>.<digit>.

このデモでは４つのモデルが使用できます
text-detection-0003は通常の文字の位置認識で、
text-detection-0004は軽量化されているものだと思われます
text-recognition-0012は文字認識用のモデルです
handwritten-score-recognition-0001は数字認識用のモデルとなります

/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name text-detection-0003 -o ~/models

/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name text-detection-0004 -o ~/models

上記コマンドで、text-detection-0003,text-detection-0004の両方をダウンロードしてみます

 klf@skullcanyon:~$ du -h models/intel/ -d 1 | grep detection
 66M models/intel/text-detection-0003
 43M models/intel/text-detection-0004

サイズをみてみると、text-detection-0003の方が20 Mbyte程度大きいことがわかります

文字認識の実行

実際に文字認識をテストしてみます
上記のパッケージ写真を認識させてみます

omz_demos_build/intel64/Release/text_detection_demo  -m_td /home/klf/models/intel/text-detection-0004/FP16/text-detection-0004.xml -m_tr /home/klf/models/intel/text-recognition-0012/FP16/text-recognition-0012.xml -dt image -i test_images/DSC_2785_s.JPG

ダウンロードしたモデル、text-detection-0004.xml、text-recognition-0012.xmlを使用して、パッケージ写真を認識させます

認識させたところ、ちゃんと文字の位置と文字が認識できているようです

ウィンドウ表示をさせたくない場合は、-no_showオプションを追加します

omz_demos_build/intel64/Release/text_detection_demo  -m_td /home/klf/models/intel/text-detection-0004/FP16/text-detection-0004.xml -m_tr /home/klf/models/intel/text-recognition-0012/FP16/text-recognition-0012.xml -dt image -i test_images/DSC_2785_s.JPG -no_show 
 InferenceEngine: 0x7febbeb97040
 [ INFO ] Parsing input parameters
 [ INFO ] Loading Inference Engine
 [ INFO ] Device info: 
 CPU
 MKLDNNPlugin version ......... 2.1
 Build ........... 37988
 

 [ INFO ] Loading network files
 [ INFO ] Reading input
 [ INFO ] Starting inference
 To close the application, press 'CTRL+C' here
 text detection model inference (ms) (fps): 103 9.70874
 text detection postprocessing (ms) (fps): 123 8.13008
 

 text recognition model inference (ms) (fps): 9.54545 104.762
 text recognition postprocessing (ms) (fps): 0.0315455 31700.3
 

 text crop (ms) (fps): 0.0533636 18739.4

-no_showオプションでウィンドウが表示されず、コンソールにそれぞれの処理速度が表示されています
ただし、このままだとどのような文字がどの位置で認識されているかわからないため、-rオプションを使用します

omz_demos_build/intel64/Release/text_detection_demo  -m_td /home/klf/models/intel/text-detection-0004/FP16/text-detection-0004.xml -m_tr /home/klf/models/intel/text-recognition-0012/FP16/text-recognition-0012.xml -dt image -i test_images/DSC_2785_s.JPG -no_show -r
 InferenceEngine: 0x7ff6594e2040
 [ INFO ] Parsing input parameters
 [ INFO ] Loading Inference Engine
 [ INFO ] Device info: 
 CPU
 MKLDNNPlugin version ......... 2.1
 Build ........... 37988
 

 [ INFO ] Loading network files
 [ INFO ] Reading input
 [ INFO ] Starting inference
 To close the application, press 'CTRL+C' here
 153,101,153,68,236,68,236,101,intell
 216,334,216,133,764,133,764,334,neural
 725,432,725,351,763,351,763,432,2
 210,438,210,355,503,355,503,438,compute
 525,435,525,355,710,355,710,435,stick
 573,481,572,450,664,447,665,479,myriad
 681,475,681,448,761,448,761,475,xvpu
 330,483,330,450,365,450,365,483,by
 369,478,369,450,437,450,437,478,intele
 441,475,441,450,557,450,557,475,movidius
 210,480,210,453,326,453,326,480,powered

認識された文字が、座標と共に出力されています

text-detection-0003と0004の比較

 klf@skullcanyon:~$ omz_demos_build/intel64/Release/text_detection_demo  -m_td /home/klf/models/intel/text-detection-0003/FP16/text-detection-0003.xml -m_tr /home/klf/models/intel/text-recognition-0012/FP16/text-recognition-0012.xml -dt image -i test_images/DSC_2785_s.JPG -no_show
 InferenceEngine: 0x7f46c4d4d040
 [ INFO ] Parsing input parameters
 [ INFO ] Loading Inference Engine
 [ INFO ] Device info: 
 CPU
 MKLDNNPlugin version ......... 2.1
 Build ........... 37988
 

 [ INFO ] Loading network files
 [ INFO ] Reading input
 [ INFO ] Starting inference
 To close the application, press 'CTRL+C' here
 text detection model inference (ms) (fps): 180 5.55556
 text detection postprocessing (ms) (fps): 127 7.87402
 

 text recognition model inference (ms) (fps): 9.46154 105.691
 text recognition postprocessing (ms) (fps): 0.0366923 27253.7
 

 text crop (ms) (fps): 0.0591538 16905.1
 

 klf@skullcanyon:~$ omz_demos_build/intel64/Release/text_detection_demo  -m_td /home/klf/models/intel/text-detection-0004/FP16/text-detection-0004.xml -m_tr /home/klf/models/intel/text-recognition-0012/FP16/text-recognition-0012.xml -dt image -i test_images/DSC_2785_s.JPG -no_show 
 InferenceEngine: 0x7f333afca040
 [ INFO ] Parsing input parameters
 [ INFO ] Loading Inference Engine
 [ INFO ] Device info: 
 CPU
 MKLDNNPlugin version ......... 2.1
 Build ........... 37988
 

 [ INFO ] Loading network files
 [ INFO ] Reading input
 [ INFO ] Starting inference
 To close the application, press 'CTRL+C' here
 text detection model inference (ms) (fps): 106 9.43396
 text detection postprocessing (ms) (fps): 123 8.13008
 

 text recognition model inference (ms) (fps): 9.54545 104.762
 text recognition postprocessing (ms) (fps): 0.0314545 31791.9
 

 text crop (ms) (fps): 0.0565455 17684.9

赤文字が0003、青文字が0004の実行で異なる部分です
青文字の0004の方が、データが軽いため実行が早くなっているものと思われます