When developing a java program in OSX, I am extracting text from a PDF file. It is a fairly straightfoward code to iterate pdf page by page, and extract text inside, like below
for (com.aspose.pdf.Page page : (Iterable<c…...the extraction had unicode artifacts (“text”: “ﻓﺼﻠﻴﺔ - ﺗﺼﺪر ﻋﻦ...extracted, and the unicode artifacts appear in their position instead...