pytesseract-OCR笔记
安装
- 软件依赖包
pip install pillow
pip install pytesseract
- 下载识别程序安装包
- 安装时候选择语言包(Additional language data)
语言包
配置脚本的程序地址
pip安装的tesseract对应的目录下,找到pytesseract.py
修改里面的tesseract_cmd
地址为exe安装的目录下的tesseract.exe
遇到的问题
-
tesseract is not installed or it’s not in your path
没有安装
tesseract-ORC.exe
软件,主识别程序没有安装 -
TesseractError: (3221225477, ‘')
安装新版本可以解决
-
1, 'Error opening data file D:\\Program Files (x86)\\Tesseract-OCR\\chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'chi_sim\' Tesseract couldn\'t load any languages! Could not initialize tesseract.'
这是因为路径不正确,5.x版本,需要直接丢 .traineddata 到exe的目录下
使用
import pytesseract
from PIL import Image
path = "xxx"
img = Image(path)
res = pytesseract.image_to_string(img, lang="chi_sim")
print(res)