Lone Wolf Development Forums  

Go Back   Lone Wolf Development Forums > Realm Works Forums > Realm Works Discussion

Notices

Reply
 
Thread Tools Display Modes
DMG
Senior Member
 
Join Date: Jun 2007
Location: Queensland, Australia
Posts: 195

Old September 14th, 2016, 04:50 PM
Hi Folks,

I'm importing the Strange Aeons AP & thought I'd share my solution here for formatting text.

The biggest problem when copy & pasting text from the PDF's is that the line breaks are exactly as per the PDF. So when text is broken into multiple smaller lines to run around an image etc, pasting into RW maintains the line breaks.

So a (reasonably) simple solution I found was to paste the raw text into Visual Studio & let it "fix" the text.

Any modern version of Visual Studio should work for this (I'm using the 2015 Community edition).

1. Create a new Visual Basic Windows Forms project.
2. Add 2 RichTextBox controls & a Button.
3. Create a click event for the button.
4. The click event code:

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
RichTextBox2.Text = RichTextBox1.Text
End Sub

When you run the project, paste the raw text into RichTextBox1 & click the button. When you do that, the cleaned text will show in RichTextBox2.

You can then select all of the text in RichTextBox2 (Ctrl-A) & paste it into Realm Works & it should have removed all of those pesky line breaks.

It doesn't preserve bolded / italicised text etc, but that's easy to fix up in RW.

Hopefully somebody will get some use from what I've found. It's sure saving me a lot of time!

Cheers,
Jim.
DMG is offline   #1 Reply With Quote
Parody
Senior Member
 
Join Date: Jan 2013
Location: Rochester, MN
Posts: 1,515

Old September 14th, 2016, 07:06 PM
This is no different than pasting and copying back out of Notepad or another text editor, as the Text property of a RichTextBox reduces the text to a String. Whether that means line breaks disappear is going to depend on what the source program puts on the Clipboard vs. what Clipboard formats a RichTextBox can accept.

If you want to get rid of newlines, get rid of them:

Code:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    RichTextBox2.Text = RichTextBox1.Text.Replace(vbLf, "")
End Sub
or more thoroughly:

Code:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim convertedText As String = RichTextBox1.Text

    convertedText = convertedText.Replace((vbCr & vbLf), "")
    convertedText = convertedText.Replace((vbLf & vbCr), "")
    convertedText = convertedText.Replace(vbLf, "")
    convertedText = convertedText.Replace(vbCr, "")

    RichTextBox2.Text = convertedText
End Sub


Last edited by Parody; September 14th, 2016 at 07:41 PM.
Parody is offline   #2 Reply With Quote
EightBitz
Senior Member
 
Join Date: May 2013
Posts: 1,458

Old September 15th, 2016, 01:33 AM
There's a key combination in RW that gets rid of all the extra line breaks.

I copy from the PDF, paste into a text editor, double space between paragraphs, past into RW, split the snippet at each blank line, then hit the key-combo for each snippet.

The biggest problem I have is the misspellings. If words with a double f will only have one. Or words with an fi or an fl will only have the f. Or sometimes, words with a double f will have a question mark. Also, words with a Th will miss the h.

For example:
"coffee" becomes "cofee" or "co?ee" or "cof?ee"
"find" becomes "fnd"
"flip" becomes "fip"
"The" becomes "Te"

The pattern isn't always 100%, so I can't write a reliable algorithm for it. I just have to run it through a spellchecker to catch as much as I can, and do some manual searches to find the main gotchas I'm aware of.
EightBitz is offline   #3 Reply With Quote
Parody
Senior Member
 
Join Date: Jan 2013
Location: Rochester, MN
Posts: 1,515

Old September 15th, 2016, 04:58 AM
Quote:
Originally Posted by EightBitz View Post
There's a key combination in RW that gets rid of all the extra line breaks.
Heh, I'd forgotten about that. (Ctrl-Alt-R, according to the snippet menus.)

Quote:
Originally Posted by EightBitz View Post
The biggest problem I have is the misspellings. If words with a double f will only have one. Or words with an fi or an fl will only have the f. Or sometimes, words with a double f will have a question mark. Also, words with a Th will miss the h.
Sounds like non-standard ligatures. If they're common in your text, you could look into doing something like the above but seeing what's actually coming out of your PDF viewer and replacing them with their component characters. The program might have to dig text out of the clipboard rather than starting from text pasted into a text box, though.

Here it'd be nice if Realm Works exposed a .NET object model. (I like scripting. :)

Parody is offline   #4 Reply With Quote
meek75
Senior Member
 
Join Date: May 2007
Location: Louisville, Ky
Posts: 330

Old September 15th, 2016, 05:01 AM
Yes, line breaks are the least of my worries. Spaces between letters of a single word are much more problematic. Misspelled words are also and issue, as EightBitz said. I have almost given up on entering a pdf exactly as it was originally. Now, I just summarize key elements and bring over required stats. The time it takes to enter it all and then correct it all once entered is just more than I can invest.
meek75 is offline   #5 Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -8. The time now is 03:58 AM.


Powered by vBulletin® - Copyright ©2000 - 2024, vBulletin Solutions, Inc.
wolflair.com copyright ©1998-2016 Lone Wolf Development, Inc. View our Privacy Policy here.