View Single Post
Chemlak
Senior Member
 
Join Date: Aug 2012
Posts: 432

Old April 14th, 2016, 09:18 AM
It's not copy protection or a fault with PDFs, it's an unfortunate side-effect of ligatures in text.

My understanding is that when you put text into a PDF, and save it, it converts them into images (which is why they can be viewed equally well regardless of the reading program: pdf stands for "portable document format" for a reason), but when you extract text from them, it takes a "best guess" as to what those images are meant to be, and sometimes fails to correctly separate common letter pairs, and sometimes inserts spaces between characters that weren't in the original text and don't appear to exist in the PDF).

The only way for it to be corrected is for PDF to be altered as a document format to retain the details of the original text, something which hasn't been done in decades for some reason.

Chief Calendar Champion Chemlak

Join the unofficial Realm Works IRC channel! Join #realm-works
Chemlak is offline   #7 Reply With Quote