Encountered another issue with PDF processing and this only happens for certain documents not all.
The code below first extracts the text and then the images. For certain PDFs, we are noticing the exact same page conten…... We are using OCR to scan the image, hence ending...leading to duplication when using OCR. This can happen due to the way...