• Please note: In an effort to ensure that all of our users feel welcome on our forums, we’ve updated our forum rules. You can review the updated rules here: http://forums.wolflair.com/showthread.php?t=5528.

    If a fellow Community member is not following the forum rules, please report the post by clicking the Report button (the red yield sign on the left) located on every post. This will notify the moderators directly. If you have any questions about these new rules, please contact support@wolflair.com.

    - The Lone Wolf Development Team

My Solution for PDF formatting (Visual Studio)

DMG

Well-known member
Hi Folks,

I'm importing the Strange Aeons AP & thought I'd share my solution here for formatting text.

The biggest problem when copy & pasting text from the PDF's is that the line breaks are exactly as per the PDF. So when text is broken into multiple smaller lines to run around an image etc, pasting into RW maintains the line breaks.

So a (reasonably) simple solution I found was to paste the raw text into Visual Studio & let it "fix" the text.

Any modern version of Visual Studio should work for this (I'm using the 2015 Community edition).

1. Create a new Visual Basic Windows Forms project.
2. Add 2 RichTextBox controls & a Button.
3. Create a click event for the button.
4. The click event code:

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
RichTextBox2.Text = RichTextBox1.Text
End Sub

When you run the project, paste the raw text into RichTextBox1 & click the button. When you do that, the cleaned text will show in RichTextBox2.

You can then select all of the text in RichTextBox2 (Ctrl-A) & paste it into Realm Works & it should have removed all of those pesky line breaks.

It doesn't preserve bolded / italicised text etc, but that's easy to fix up in RW.

Hopefully somebody will get some use from what I've found. It's sure saving me a lot of time!

Cheers,
Jim.
 
This is no different than pasting and copying back out of Notepad or another text editor, as the Text property of a RichTextBox reduces the text to a String. Whether that means line breaks disappear is going to depend on what the source program puts on the Clipboard vs. what Clipboard formats a RichTextBox can accept.

If you want to get rid of newlines, get rid of them:

Code:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    RichTextBox2.Text = RichTextBox1.Text.Replace(vbLf, "")
End Sub

or more thoroughly:

Code:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim convertedText As String = RichTextBox1.Text

    convertedText = convertedText.Replace((vbCr & vbLf), "")
    convertedText = convertedText.Replace((vbLf & vbCr), "")
    convertedText = convertedText.Replace(vbLf, "")
    convertedText = convertedText.Replace(vbCr, "")

    RichTextBox2.Text = convertedText
End Sub
 
Last edited:
There's a key combination in RW that gets rid of all the extra line breaks.

I copy from the PDF, paste into a text editor, double space between paragraphs, past into RW, split the snippet at each blank line, then hit the key-combo for each snippet.

The biggest problem I have is the misspellings. If words with a double f will only have one. Or words with an fi or an fl will only have the f. Or sometimes, words with a double f will have a question mark. Also, words with a Th will miss the h.

For example:
"coffee" becomes "cofee" or "co?ee" or "cof?ee"
"find" becomes "fnd"
"flip" becomes "fip"
"The" becomes "Te"

The pattern isn't always 100%, so I can't write a reliable algorithm for it. I just have to run it through a spellchecker to catch as much as I can, and do some manual searches to find the main gotchas I'm aware of.
 
There's a key combination in RW that gets rid of all the extra line breaks.
Heh, I'd forgotten about that. (Ctrl-Alt-R, according to the snippet menus.)

The biggest problem I have is the misspellings. If words with a double f will only have one. Or words with an fi or an fl will only have the f. Or sometimes, words with a double f will have a question mark. Also, words with a Th will miss the h.
Sounds like non-standard ligatures. If they're common in your text, you could look into doing something like the above but seeing what's actually coming out of your PDF viewer and replacing them with their component characters. The program might have to dig text out of the clipboard rather than starting from text pasted into a text box, though.

Here it'd be nice if Realm Works exposed a .NET object model. (I like scripting. :)
 
Yes, line breaks are the least of my worries. Spaces between letters of a single word are much more problematic. Misspelled words are also and issue, as EightBitz said. I have almost given up on entering a pdf exactly as it was originally. Now, I just summarize key elements and bring over required stats. The time it takes to enter it all and then correct it all once entered is just more than I can invest.
 
Back
Top