Lone Wolf Development Forums  

Go Back   Lone Wolf Development Forums > Realm Works Forums > Realm Works Discussion
Register FAQ Community Today's Posts Search

Notices

Reply
 
Thread Tools Display Modes
daplunk
Senior Member
 
Join Date: Jan 2016
Location: Adelaide, Australia
Posts: 2,294

Old April 14th, 2016, 01:43 AM
Has anyone had experience with copying pre-made PDF's into Realm Works where the copy and paste goes all funky and adds spaces after every character?

Example: c h a r a c t e r s a r e c h i ld r e n

The issue being that this is obviously not going to be efficient to enter into the tool.

Is there a way to remove the excess spaces while ensuring a space remains after words?
daplunk is offline   #1 Reply With Quote
Acenoid
Senior Member
 
Join Date: Dec 2013
Posts: 798

Old April 14th, 2016, 02:42 AM
The problem seems to be with the pdf then. You can Only copy what is there or does copy paste into notepad look better,?

Join the (unofficial) Realm-Works IRC Chat: #realm-works on the Rizon Network (https://wiki.rizon.net/index.php?title=Servers)
-> Browser Client: https://kiwiirc.com/client/irc.rizon.net
Acenoid is offline   #2 Reply With Quote
daplunk
Senior Member
 
Join Date: Jan 2016
Location: Adelaide, Australia
Posts: 2,294

Old April 14th, 2016, 02:54 AM
It's the same in notepad. I tried Notepad++ and displayed all characters. It's got space in everything
daplunk is offline   #3 Reply With Quote
ich.pdf
Member
 
Join Date: Mar 2016
Location: Cologne, Germany
Posts: 46

Old April 14th, 2016, 04:01 AM
I have a similar issue when pasting from Shadowrun pdfs. It is very odd, because it affects only words with certain sequences of letters. Afaik it is everything with "fi" and "fl". The sequence will be replaced with two spaces upon pasting it into any other program.

I guess that is some sort of copy protection. I don't really have a clue about programming, but probably in your case the pdf actually contains all those spaces, but somewhere in the code it tells the pdf reader to not display single spaces and display only one if there are two...
Just my guess :-D

Haven't found a workaround for that. If I am right, then a "conditioned paste option" would work theoretically. You would have to figure out what the pdf "really" looks like and what the pdf reader is told to ignore and then apply that to a paste mechanism.

I guess there are more urgent things for LWD to care about, but if a talented programmer out there hast too much time :-D
ich.pdf is offline   #4 Reply With Quote
Acenoid
Senior Member
 
Join Date: Dec 2013
Posts: 798

Old April 14th, 2016, 04:11 AM
I often wondered why there are no odf readers that can hide the lay out and show the text formatting only. But likely that the top many layout issued with it.

If you use the option save as txt in Adobe reader you will see. Pdfs look like crap

Join the (unofficial) Realm-Works IRC Chat: #realm-works on the Rizon Network (https://wiki.rizon.net/index.php?title=Servers)
-> Browser Client: https://kiwiirc.com/client/irc.rizon.net
Acenoid is offline   #5 Reply With Quote
adzling
Senior Member
 
Join Date: Apr 2015
Posts: 343

Old April 14th, 2016, 07:40 AM
I've run into the "fi" and "fl" issue with Srun PDFs as well.
It affects all of them.

But then I take it as an opportunity to read the material I'm copypastaing into my realm, editing as a I go.

Quote:
Originally Posted by ich.pdf View Post
I have a similar issue when pasting from Shadowrun pdfs. It is very odd, because it affects only words with certain sequences of letters. Afaik it is everything with "fi" and "fl". The sequence will be replaced with two spaces upon pasting it into any other program.

I guess that is some sort of copy protection. I don't really have a clue about programming, but probably in your case the pdf actually contains all those spaces, but somewhere in the code it tells the pdf reader to not display single spaces and display only one if there are two...
Just my guess :-D

Haven't found a workaround for that. If I am right, then a "conditioned paste option" would work theoretically. You would have to figure out what the pdf "really" looks like and what the pdf reader is told to ignore and then apply that to a paste mechanism.

I guess there are more urgent things for LWD to care about, but if a talented programmer out there hast too much time :-D
adzling is offline   #6 Reply With Quote
Chemlak
Senior Member
 
Join Date: Aug 2012
Posts: 432

Old April 14th, 2016, 09:18 AM
It's not copy protection or a fault with PDFs, it's an unfortunate side-effect of ligatures in text.

My understanding is that when you put text into a PDF, and save it, it converts them into images (which is why they can be viewed equally well regardless of the reading program: pdf stands for "portable document format" for a reason), but when you extract text from them, it takes a "best guess" as to what those images are meant to be, and sometimes fails to correctly separate common letter pairs, and sometimes inserts spaces between characters that weren't in the original text and don't appear to exist in the PDF).

The only way for it to be corrected is for PDF to be altered as a document format to retain the details of the original text, something which hasn't been done in decades for some reason.

Chief Calendar Champion Chemlak

Join the unofficial Realm Works IRC channel! Join #realm-works
Chemlak is offline   #7 Reply With Quote
AEIOU
Senior Member
 
Join Date: Jan 2012
Posts: 1,147

Old April 14th, 2016, 11:18 AM
Chemlak nailed it. Some fonts are better than others for ligatures and translation from PDFs but there's a lot of voodoo and even a touch of black magic involved. Welcome to the world of typography.

For cases like the spacing issue, I've sometimes resorted to replacing double spaces with "xx", replacing single spaces with no space, then replacing "xx" with a single space. That works sometimes but it's a pain. Most of the time I retype and move on.
AEIOU is offline   #8 Reply With Quote
MNBlockHead
Senior Member
 
Join Date: Dec 2014
Location: Twin Cities Area, MN, USA
Posts: 1,325

Old April 14th, 2016, 01:01 PM
If you do a lot of copying from messy PDFs, it may be worth your time to become familiar with Notepad++, UltraEdit, or similar text editor that can search for an remove/replace unwanted spaces, characters, and line breaks. I little bit of RegEx can go a long way.

RW Project: Dungeons & Dragons 5th edition homebrew world
Other Tools: CampaignCartographer, Cityographer, Dungeonographer, Evernote
MNBlockHead is offline   #9 Reply With Quote
rob
Senior Member
Lone Wolf Staff
 
Join Date: May 2005
Posts: 8,232

Old April 14th, 2016, 01:06 PM
@Chemlak is close, but he's not entirely correct. Ligatures are a definite complicating factor. But PDFs are not "images" (although they can contain them). The text portion of PDFs is actually just a bunch of characters on a page. The best analogy I can think of is the following...

1. Heat up a bowl of alphabet soup
2. Pull out the letters to form a sentence
3. Arrange them appropriately on the table in front of you
4. You now have the contents of a PDF

Yes, that's correct. A PDF is nothing more than individual letters positioned on a page. That's it. Nothing more.

So when you try to pull the text out of a PDF, all you have are the raw characters. Since each character is explicitly positioned on the page, the only way to get the "text" back out is to interpret the relative positions of those characters and calculate a "best guess" whether there's a space between them or not. Unfortunately, different fonts have different sizing and spacing characteristics. This means that the calculations for each font must take into consideration the wide-ranging font characteristics, which means understanding the details of each font and the individual symbols within it, which are all mathematically defined. That gets incredibly complicated, so nobody does it.

The net result is that simple logic is used to determine whether one character is "next to" another and whether a space should be inserted between them. That simple logic works alright for basic fonts, except when things like ligatures complicate things, and it fails horribly for more "interesting" fonts, which RPG publishers love to use to "dress up" their products.

That's why you end up with text sometimes extracting like @daplunk cited above.

Hope this helps to explain what you're seeing!
rob is offline   #10 Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -8. The time now is 11:28 PM.


Powered by vBulletin® - Copyright ©2000 - 2024, vBulletin Solutions, Inc.
wolflair.com copyright ©1998-2016 Lone Wolf Development, Inc. View our Privacy Policy here.