笔者环境 centos7 python3
pytesseract只是tesseract-ocr的一种实现接口。所以要先安装tesseract-ocr(大名鼎鼎的开源的OCR识别引擎)。
依赖安装
1 2 | yum install -y automake autoconf libtool gcc gcc -c++
yum install -y libpng-devel libjpeg-devel libtiff-devel giflib-devel
|
安装依赖的leptonica库
1 2 3 4 5 | wget http: //www .leptonica.com /source/leptonica-1 .72. tar .gz
tar -xzvf leptonica-1.72. tar .gz
cd leptonica-1.72
. /configure
make && make install
|
安装tesseract-ocr
1 2 3 4 5 6 | wget https: //github .com /tesseract-ocr/tesseract/archive/3 .04.00. tar .gz
mv 3.04.00 Tesseract3.04.00. tar .gz
tar -xvf Tesseract3.04.00. tar .gz
cd tesseract-3.04.00/
. /configure
make && make install
|
安装语言包:
1 2 3 4 5 | wget https: //github .com /tesseract-ocr/tessdata/raw/master/eng .traineddata #英文默认包
wget https: //github .com /tesseract-ocr/tessdata/raw/master/chi_sim .traineddata #中文繁体
wget https: //github .com /tesseract-ocr/tessdata/raw/master/chi_tra .traineddata #中文简体
cp /mv *.traineddata /usr/local/share/tessdata/ #移动下载的包到/usr/local/share/tessdata/ 这个路径下,也可以手动移动
|
安装pytesseract:
1 2 | pip install Pillow
pip install pytesseract
|
至此安装完成,附上使用方法:
import pytesseract
from PIL import Imag
image = Image.open("port_img.jpg")
text = pytesseract.image_to_string(image)
print(text)
参考资料:
https://www.cnblogs.com/dajianshi/p/4932882.html https:///questions/33659458/tesseract-image-issue
|