@Chemlak is close, but he's not entirely correct. Ligatures are a definite complicating factor. But PDFs are not "images" (although they can contain them). The text portion of PDFs is actually just a bunch of characters on a page. The best analogy I can think of is the following...
1. Heat up a bowl of alphabet soup
2. Pull out the letters to form a sentence
3. Arrange them appropriately on the table in front of you
4. You now have the contents of a PDF
Yes, that's correct. A PDF is nothing more than individual letters positioned on a page. That's it. Nothing more.
So when you try to pull the text out of a PDF, all you have are the raw characters. Since each character is explicitly positioned on the page, the only way to get the "text" back out is to interpret the relative positions of those characters and calculate a "best guess" whether there's a space between them or not. Unfortunately, different fonts have different sizing and spacing characteristics. This means that the calculations for each font must take into consideration the wide-ranging font characteristics, which means understanding the details of each font and the individual symbols within it, which are all mathematically defined. That gets incredibly complicated, so nobody does it.
The net result is that simple logic is used to determine whether one character is "next to" another and whether a space should be inserted between them. That simple logic works alright for basic fonts, except when things like ligatures complicate things, and it fails horribly for more "interesting" fonts, which RPG publishers love to use to "dress up" their products.
That's why you end up with text sometimes extracting like @daplunk cited above.
Hope this helps to explain what you're seeing!
