Lone Wolf Development Forums

Lone Wolf Development Forums (http://forums.wolflair.com/index.php)
-   Realm Works Discussion (http://forums.wolflair.com/forumdisplay.php?f=67)
-   -   Exporting basics and managing exports/imports (http://forums.wolflair.com/showthread.php?t=60396)

Valyar March 19th, 2018 03:49 AM

Exporting basics and managing exports/imports
 
In my recent endeavor to optimize my monolithic realms I consider to move the published adventures in their own realms and create temporary ones for specific adventure on demand. Enter Manage Exports and Imports :D

While I was reading the documentation that came with RW, I found that it is incomplete when it comes to advanced usage of the tool. The sections related to export and data manipulation are TBD, empty or lacking.

So, I figured out the following:
  1. There is hidden "Export" tag domain that is not visible in the list in Tags section.
  2. The name of each tag under the Export domain is equal to the Name of the export. For example "Realm 1 Full Export" generates the same tag.
  3. Tag is generated for "Structure only" export and as you can't tag anything in the category or tags by yourself, I assume this tag is applied automatically on each Topic/Article in the Category and Tags in the Tag section (as tags are also exported)
  4. Tagging is not that smart. Unless configured to tag on creation of a resource, it will not tag with the full realm export and unless actions taken, exports can be incomplete.
  5. Custom exports are subject to the same rules - tag is created and you must manually assign it to content that you want exported.
  6. Exporting Structure is required prior exporting Realm data, as I don't see export of the structure clearly in the generated files

Questions:
  1. Anything missing or incorrectly captured in the list above?
  2. Is there documentation on the structure of the output XML files when you export in "Compact Output" and/or "Full Output", so volunteer data file contributors to move content between RW and HL or FG for example with less pain and agony?

kbs666 March 19th, 2018 11:08 AM

You have a lot wrong.

A structure only export does not rely on tags.
Tagging is very smart. It uses an existing feature to do something rather than adding an entirely new one. If you designate an export as full it will be full and the export tag will be applied everywhere.

Partial exports do work the same way. The easiest way to apply the export tag is to create the export, but not do the export then create a view of the material you want in that export and use the bulk tagging feature to apply that tag to the entire view. the bulk tagging feature was added for expressly this usage.

Structure exports are not required prior to exporting data. Any export of a realm will also include the structure of the realm.

To the best of my knowledge no one has produced a DTD or XSD of the export file XML. According to what we've been told compact is for consumption by programs not RW while full is for RW's use.

Valyar March 20th, 2018 01:20 AM

How I got "a lot wrong" when all only points 4 and 6 are off the target? :) The Quick Help here was of good use.
I wonder if DTD or XSD will be ever provided to facilitate integration of RW with other tools (I currently think for Fantasy Grounds, as we play there from time to time).

Anyway, the tag for full export is not applied everywhere automatically, I had 20% of the articles without the full realm export tag and this includes manually created articles (I initially assumed it is due to bulk uploaded content). I haven't had time to check if the un-tagged content will be exported without the tag, but not that important as with bulk tag I can easily add whatever is missing. Custom Exports are clear.

kbs666 March 20th, 2018 04:06 AM

Did the full realm export? was it all in the new realm when you imported? Did it all have the tag there?

Valyar March 30th, 2018 11:52 PM

Finally I found time to fool around with the Full/Partial Export/Import features of Realm Works. The realm I exported has ~2200 topics/articles and the whole database with all realms is 533MB. The export/import operation was between different disks/arrays and from disk to NAS over 1GB network.

So, some observations

Full Realm Export (Full output)
  • Prior exporting, the software does bulk tagging of all articles with the domain tag of full realm export. I intentionally deleted this tag to find it out added later. So all exports are based on tags and without tag you don't get anything exported. Deleting some tags results in those tags added back before full export.
  • The export of this realm only is ~580MB, which is 50 MB MB above the size of the database that hosts other realms as well. I am not sure what bloats it, but we don't argue for this those megs in 2018.
  • The export was slow. My workstation is equipped with 500GB SSD to ensure higher IOPS than smaller SSDs and it took ~10 minutes to export half a GB of database. As this amount of data is saved in an instant with different copy/export/dump operations from different software I work with, I presume the data manipulation that results in the XML file is not optimized, does not take good use of SMP or just "it works the way it works". I was monitoring with process explorer and vmmap and RealmWorks didn't even try to put the system resources to a good use... :(

Full Realm Import
  • The import was also slow, took longer than 10 minutes. Again, the machine was close to idle state... :(
  • The size of the master was quadrupled... to 2+GB and after compress action performed to the expected size of 1GB. I understand that some temp tables are used, but an automatic post-import cleanup might be a good idea.
  • Structure was indeed imported correctly, no need to first get structure file.
  • Import tag domain matching the name of the export tag was created. Those tags can't be removed by the bulk utility, just by hand... Obviously can't be added later. I am not sure if removing a tag from article and importing again same one will cause issue or crash, too lazy to experiment now.
  • All tags were preserved, importing other realms with same structure and conflicting structure had expected results, but I would be happy for the conflicting part not to be ignored but somehow "put aside" for manual review inside the realm.

Partial Export/Imports
  • Much faster with same stuffs related to tagging. The downside is that I can't delete Custom/Partial export definitions, just modify them, which does not allow clean up when the list becomes big or entries unnecessary. I hope this will happen some day.
  • Same is valid for the Imports - I would like to be able to delete the import and keep the data, but this is not that critical at all.

I don't know why those full export/import operations are slow and what is happening behind the scenes. I don't dare to think what will be when I reach GBs of content someday...

Farling March 31st, 2018 12:27 AM

The increase in size of various fields over the base database is due to all binary data being stored in the XML file in base64 format (so each 3 bytes of image/file takes up 4 bytes in the XML file).

In a realm with a lot of files or images, there will be a lot of binary to base64 encoding going on.

I think removing the import tag will cause confusion when a later import is created from the same realm, since RW will probably create duplicate topics.

Valyar March 31st, 2018 01:54 AM

Thanks for the clarification. Binary to base64 makes sense, I have all tables as images taken from the books. :)

kbs666 March 31st, 2018 04:20 AM

One of the reasons for the lack of performance on modern workstations that should do task like RW very well is that RW is a 32 bit application and simply cannot take advantage of most of those system resources even when they are available. The program also doesn't appear to be multi threaded at all. Things like import or export don't create spikes in the usage of any cores/threads not already in use.

Farling March 31st, 2018 07:40 AM

Quote:

Originally Posted by kbs666 (Post 264984)
One of the reasons for the lack of performance on modern workstations that should do task like RW very well is that RW is a 32 bit application and simply cannot take advantage of most of those system resources even when they are available. The program also doesn't appear to be multi threaded at all. Things like import or export don't create spikes in the usage of any cores/threads not already in use.

To generate a single XML file, in which everything is necessarily serialised, would add a reasonable amount of complexity to make it multi-threaded. (I know from trying to multi-thread the XML output tool :) )

The infrequent nature of creating XML exports wouldn't make it a high priority for optimisation.

kbs666 March 31st, 2018 09:57 AM

Quote:

Originally Posted by Farling (Post 264991)
To generate a single XML file, in which everything is necessarily serialised, would add a reasonable amount of complexity to make it multi-threaded. (I know from trying to multi-thread the XML output tool :) )

The infrequent nature of creating XML exports wouldn't make it a high priority for optimisation.

That depends on how you approach the task.

<warning for those not interested in such technical matters>

Certain parts of the XML can be generated in any order, such as the bulk of the topic and articles. Threads could be spun off to process individual topics with the master thread putting the results of each processing thread into the output file. This should take advantage of all the processing power available while having minimal impact on lower core count systems.

Of course depending on how much overhead there is in creating and destroying a thread this might be slower than doing it single threaded. While I've done a lot of work in C# relatively little has been in any sort of heavily optimized processing. When I've known the program was going to have a heavy workload I've gone to a better solution.


All times are GMT -8. The time now is 02:47 AM.

Powered by vBulletin® - Copyright ©2000 - 2024, vBulletin Solutions, Inc.
wolflair.com copyright ©1998-2016 Lone Wolf Development, Inc. View our Privacy Policy here.