GlacialMan & anmol77 写道:
what is the file osd_pxvocr.dat?
seems to be related to the Orientation & Script Detection (OSD)
function of Tesseract OCR, which uses osd.traineddata
(download from: https://github.com/justin/tesseract-ocr ... raineddata
is PDF-XChange Viewer's ("pxv") amended proprietary equivalent of osd.traineddata
OSD is used by Tesseract OCR engine to analyze the text script (writing direction, textline order), as well as dominant page-orientation of the image to be OCR'ed.
If the OSD data file is missing, the application may fail to accurately OCR image files whose text is arranged in different directions & orders (eg. tables, essay with indented paragraphs/quotes, maps, etc.)