Fixed an issue with OOM in OCR phase.
Changes:
- Introduced a
PDFDataset
to represent a PDF file - Enabled
pin_memory
for faster inference & ability to swap out - Changed batch size selection rule to work even on small machines
All changes were evaluated on multiple inputs ranging from a few-pages article to a whole book (B. Stroustrup PP&P)