Senior Member
Join Date: Dec 2010
Location: Virginia, USA
Posts: 335
|
This file that I was pulling from was from 2011. A couple of the more recent things I purchased in the last year or two don't seem to be suffering from the same issue. It seems to be related to stuff produced before 2014, FWIW.
Minutus cantorum, minutus balorum, minutus carborata descendum pantorum. |
#21 |
Senior Member
Join Date: Dec 2013
Posts: 798
|
i scan bookd with abby fine reader, if youre a student you ll get a discount and it gives good ocr results. Of course you need a good book scanner (without a border) i only found one relatively cheap one, it still takes time and theoretically 2-3% error on a page is still a lot. I would recommend to proof read the "numbers and formulas only" the remainder you can leave....
Not sure if abby can ocr read from pdf... should try one day Join the (unofficial) Realm-Works IRC Chat: #realm-works on the Rizon Network (https://wiki.rizon.net/index.php?title=Servers) -> Browser Client: https://kiwiirc.com/client/irc.rizon.net |
#22 |
Senior Member
Join Date: Aug 2010
Location: Calgary, Alberta
Posts: 385
|
I have noticed that it is usually only one or two main character combinations that are a problem with each pdf. I usually set the search and replace to cover and correct the worst instance and then do a quick scan and repair of the others. Not the most elegant solution but I find I get about 95% of the errors that way.
|
#23 |
Senior Member
Join Date: Jan 2016
Location: Adelaide, Australia
Posts: 2,294
|
I found a workable solution to this by saving images of the text and then using an online OCR tool. This looks at the image, coverts it to text and the spaces were all in the right place!
Going to be a PITA to do the whole module (HOTDQ) but now that I'm hooked on RW it simply has to be done. |
#24 |
Senior Member
Join Date: Jan 2016
Location: Adelaide, Australia
Posts: 2,294
|
Going to expand on this because I'm really happy with the result.
1. Open the PDF in Adobe Pro and convert to PNG file. File > Save As Other > Image > PNG 2. Open Google Drive and setup to convert files. Google Drive > Settings > Converts Uploads > Tick 3. Copy image files created by Adobe Pro into Google Drive 4. Right click the upload files > Open With > Google Docs 5. This will open the file in a separate tab, the image will be at the top of the file but there will be an added page underneath with the extracted text. You will need to make a scan for errors and fix them up manually but ultimately the end result is really good. |
#25 |
Senior Member
Join Date: Jan 2016
Location: Adelaide, Australia
Posts: 2,294
|
Example extract using this method:
Original: T h e k e e p h a s a s a l ly p o r t a lo n g th e w e s t w a l l fo r c o u n t e r a t ta c k in g f o e s w h o b r in g a b a t t e r in g r am a g a in s t th e g a te s . D u r in g th e n ig h t w h i le c h a r a c t e r s a r e in th e k e e p , r a id e r s a p p r o a c h th e o ld g a te , f o r c e it o p e n , a n d r u s h th r o u g h . E s c o b e r t d is c o v e r s th em a n d r a c e s in to th e c o u r t y a r d t o s o u n d th e a la rm a h e a d o f th e in f i lt r a to r s . Output: The kccp has a sally port along the west wall for counterattacking focs who bring a battering ram against the gates. During the night while characters are in the keep, raiders approach the old gate, force it open, and rush through, Escobert discovers them and races into the courtyard to sound the alarm ahead of the infiltrators. Enough defenders are available to deal with the immediate threat from raiders loose in the keep, since it's more a probe that got out of hand rather than a full-scale assault, Escobert is most concerned about resealing the sally port, and he seeks out the characters for that job. To secure the sally port, |
#26 |
Senior Member
Join Date: Jan 2013
Location: Rochester, MN
Posts: 1,517
|
Here is the same text from the Encounters version (straight copy-paste from a PDF made by WotC):
Quote:
It's unfortunate that most 5th edition products are not available as PDFs. :( Last edited by Parody; April 17th, 2016 at 06:02 PM. |
|
#27 |
Senior Member
Join Date: Jan 2016
Location: Adelaide, Australia
Posts: 2,294
|
Yeah a much cleaner PDF. Shame there isn't an official one for the whole module.
|
#28 |
Senior Member
Join Date: Dec 2013
Posts: 798
|
Try to download "imagemagick" with this tool you might be able to batch convert the pdf into pages in one go. Now for the 2nd part I don't have a suggestion...yet
Example Command line: convert -density 300 -depth 8 -quality 96 a.pdf a.png Join the (unofficial) Realm-Works IRC Chat: #realm-works on the Rizon Network (https://wiki.rizon.net/index.php?title=Servers) -> Browser Client: https://kiwiirc.com/client/irc.rizon.net |
#29 |
|
|