Lone Wolf Development Forums

Lone Wolf Development Forums (http://forums.wolflair.com/index.php)
-   Realm Works Discussion (http://forums.wolflair.com/forumdisplay.php?f=67)
-   -   My Solution for PDF formatting (Visual Studio) (http://forums.wolflair.com/showthread.php?t=56587)

DMG September 14th, 2016 04:50 PM

My Solution for PDF formatting (Visual Studio)
 
Hi Folks,

I'm importing the Strange Aeons AP & thought I'd share my solution here for formatting text.

The biggest problem when copy & pasting text from the PDF's is that the line breaks are exactly as per the PDF. So when text is broken into multiple smaller lines to run around an image etc, pasting into RW maintains the line breaks.

So a (reasonably) simple solution I found was to paste the raw text into Visual Studio & let it "fix" the text.

Any modern version of Visual Studio should work for this (I'm using the 2015 Community edition).

1. Create a new Visual Basic Windows Forms project.
2. Add 2 RichTextBox controls & a Button.
3. Create a click event for the button.
4. The click event code:

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
RichTextBox2.Text = RichTextBox1.Text
End Sub

When you run the project, paste the raw text into RichTextBox1 & click the button. When you do that, the cleaned text will show in RichTextBox2.

You can then select all of the text in RichTextBox2 (Ctrl-A) & paste it into Realm Works & it should have removed all of those pesky line breaks.

It doesn't preserve bolded / italicised text etc, but that's easy to fix up in RW.

Hopefully somebody will get some use from what I've found. It's sure saving me a lot of time!

Cheers,
Jim.

Parody September 14th, 2016 07:06 PM

This is no different than pasting and copying back out of Notepad or another text editor, as the Text property of a RichTextBox reduces the text to a String. Whether that means line breaks disappear is going to depend on what the source program puts on the Clipboard vs. what Clipboard formats a RichTextBox can accept.

If you want to get rid of newlines, get rid of them:

Code:

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    RichTextBox2.Text = RichTextBox1.Text.Replace(vbLf, "")
End Sub

or more thoroughly:

Code:

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim convertedText As String = RichTextBox1.Text

    convertedText = convertedText.Replace((vbCr & vbLf), "")
    convertedText = convertedText.Replace((vbLf & vbCr), "")
    convertedText = convertedText.Replace(vbLf, "")
    convertedText = convertedText.Replace(vbCr, "")

    RichTextBox2.Text = convertedText
End Sub


EightBitz September 15th, 2016 01:33 AM

There's a key combination in RW that gets rid of all the extra line breaks.

I copy from the PDF, paste into a text editor, double space between paragraphs, past into RW, split the snippet at each blank line, then hit the key-combo for each snippet.

The biggest problem I have is the misspellings. If words with a double f will only have one. Or words with an fi or an fl will only have the f. Or sometimes, words with a double f will have a question mark. Also, words with a Th will miss the h.

For example:
"coffee" becomes "cofee" or "co?ee" or "cof?ee"
"find" becomes "fnd"
"flip" becomes "fip"
"The" becomes "Te"

The pattern isn't always 100%, so I can't write a reliable algorithm for it. I just have to run it through a spellchecker to catch as much as I can, and do some manual searches to find the main gotchas I'm aware of.

Parody September 15th, 2016 04:58 AM

Quote:

Originally Posted by EightBitz (Post 234591)
There's a key combination in RW that gets rid of all the extra line breaks.

Heh, I'd forgotten about that. (Ctrl-Alt-R, according to the snippet menus.)

Quote:

Originally Posted by EightBitz (Post 234591)
The biggest problem I have is the misspellings. If words with a double f will only have one. Or words with an fi or an fl will only have the f. Or sometimes, words with a double f will have a question mark. Also, words with a Th will miss the h.

Sounds like non-standard ligatures. If they're common in your text, you could look into doing something like the above but seeing what's actually coming out of your PDF viewer and replacing them with their component characters. The program might have to dig text out of the clipboard rather than starting from text pasted into a text box, though.

Here it'd be nice if Realm Works exposed a .NET object model. (I like scripting. :)

meek75 September 15th, 2016 05:01 AM

Yes, line breaks are the least of my worries. Spaces between letters of a single word are much more problematic. Misspelled words are also and issue, as EightBitz said. I have almost given up on entering a pdf exactly as it was originally. Now, I just summarize key elements and bring over required stats. The time it takes to enter it all and then correct it all once entered is just more than I can invest.


All times are GMT -8. The time now is 06:46 AM.

Powered by vBulletin® - Copyright ©2000 - 2024, vBulletin Solutions, Inc.
wolflair.com copyright ©1998-2016 Lone Wolf Development, Inc. View our Privacy Policy here.