There's a key combination in RW that gets rid of all the extra line breaks.
I copy from the PDF, paste into a text editor, double space between paragraphs, past into RW, split the snippet at each blank line, then hit the key-combo for each snippet.
The biggest problem I have is the misspellings. If words with a double f will only have one. Or words with an fi or an fl will only have the f. Or sometimes, words with a double f will have a question mark. Also, words with a Th will miss the h.
For example:
"coffee" becomes "cofee" or "co?ee" or "cof?ee"
"find" becomes "fnd"
"flip" becomes "fip"
"The" becomes "Te"
The pattern isn't always 100%, so I can't write a reliable algorithm for it. I just have to run it through a spellchecker to catch as much as I can, and do some manual searches to find the main gotchas I'm aware of.
|