Senior Member
Join Date: Jun 2007
Location: Queensland, Australia
Posts: 195
|
Hi Folks,
I'm importing the Strange Aeons AP & thought I'd share my solution here for formatting text. The biggest problem when copy & pasting text from the PDF's is that the line breaks are exactly as per the PDF. So when text is broken into multiple smaller lines to run around an image etc, pasting into RW maintains the line breaks. So a (reasonably) simple solution I found was to paste the raw text into Visual Studio & let it "fix" the text. Any modern version of Visual Studio should work for this (I'm using the 2015 Community edition). 1. Create a new Visual Basic Windows Forms project. 2. Add 2 RichTextBox controls & a Button. 3. Create a click event for the button. 4. The click event code: Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click RichTextBox2.Text = RichTextBox1.Text End Sub When you run the project, paste the raw text into RichTextBox1 & click the button. When you do that, the cleaned text will show in RichTextBox2. You can then select all of the text in RichTextBox2 (Ctrl-A) & paste it into Realm Works & it should have removed all of those pesky line breaks. It doesn't preserve bolded / italicised text etc, but that's easy to fix up in RW. Hopefully somebody will get some use from what I've found. It's sure saving me a lot of time! Cheers, Jim. |
#1 |
Senior Member
Join Date: Jan 2013
Location: Rochester, MN
Posts: 1,516
|
This is no different than pasting and copying back out of Notepad or another text editor, as the Text property of a RichTextBox reduces the text to a String. Whether that means line breaks disappear is going to depend on what the source program puts on the Clipboard vs. what Clipboard formats a RichTextBox can accept.
If you want to get rid of newlines, get rid of them: Code:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click RichTextBox2.Text = RichTextBox1.Text.Replace(vbLf, "") End Sub Code:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click Dim convertedText As String = RichTextBox1.Text convertedText = convertedText.Replace((vbCr & vbLf), "") convertedText = convertedText.Replace((vbLf & vbCr), "") convertedText = convertedText.Replace(vbLf, "") convertedText = convertedText.Replace(vbCr, "") RichTextBox2.Text = convertedText End Sub Last edited by Parody; September 14th, 2016 at 07:41 PM. |
#2 |
Senior Member
Join Date: May 2013
Posts: 1,458
|
There's a key combination in RW that gets rid of all the extra line breaks.
I copy from the PDF, paste into a text editor, double space between paragraphs, past into RW, split the snippet at each blank line, then hit the key-combo for each snippet. The biggest problem I have is the misspellings. If words with a double f will only have one. Or words with an fi or an fl will only have the f. Or sometimes, words with a double f will have a question mark. Also, words with a Th will miss the h. For example: "coffee" becomes "cofee" or "co?ee" or "cof?ee" "find" becomes "fnd" "flip" becomes "fip" "The" becomes "Te" The pattern isn't always 100%, so I can't write a reliable algorithm for it. I just have to run it through a spellchecker to catch as much as I can, and do some manual searches to find the main gotchas I'm aware of. |
#3 |
Senior Member
Join Date: Jan 2013
Location: Rochester, MN
Posts: 1,516
|
Quote:
Quote:
Here it'd be nice if Realm Works exposed a .NET object model. (I like scripting. :) |
||
#4 |
Senior Member
Join Date: May 2007
Location: Louisville, Ky
Posts: 330
|
Yes, line breaks are the least of my worries. Spaces between letters of a single word are much more problematic. Misspelled words are also and issue, as EightBitz said. I have almost given up on entering a pdf exactly as it was originally. Now, I just summarize key elements and bring over required stats. The time it takes to enter it all and then correct it all once entered is just more than I can invest.
|
#5 |
|
|