Using restxq. I was hoping to speed things up with parallel processing :-).
We are using some new indices to speed things up and more can be done. The issue main issue with that we process a lot of files and there are multiple levels of processing:
1- Apply 1st level
2- Save to db
3- Apply 2nd level
4- Save to db
5- Apply 3rd level
Why we work by level is to be able to search content after it's been processed in a level. So we need indices to be refreshed. For each level I apply everything I can before I need to re-indexing.
The levels look something like that (with some variations):
1- Add ids to all elements (content coming from authors through webdav doesn't always have all the required ids)
2- Aggregate content for a publication... That means resolving references recursively until all the pieces that create a larger publication are aggregated
3- Filter out content that doesn't apply to the current configuration (done after aggregation because we may use the same aggregate for multiple filter combination - for example we may have a publication for 2 similar products where the same content is used but a few lines here and there are different... Getting the same publication out for 2 different OS version would be a good example. Same content, tiny differences here and there.)
4- Apply transformation to filtered aggregate (to one or more formats: HTML, PDF, csv, rss all or whatever is needed)
If I am outputting the publication in HTML and PDF for 26 of the 52 languages, I was hoping to be able to apply filter and aggregates on the 26 dbs pairs (base + staging) at once. Maybe I need 26 instances of BaseX where each instance has a lang... Then my js could call each instance individually. That's a lot of ports... and also again... not easy for clients to just add a language. If it means parallel processing, it may be worth it.
Then I'd need to figure out handling processes that use more than one instance of BaseX... like the translation processes. A lot of files would need to go through outside of baseX thought the .js. I might need a node.js layer. I can't imagine the .js client doing all the work... So far the client was pretty light, so the controlling was split between .js and .xqm. I though moving the lang loop outside of the .xqm would mean parallel processing just because each call to the .xqm function would be separate, each with their own $lang. As you know, that didn't do it. Oupsy.
Optimizing performance is key for us at this point... so any clue is welcomed.
The 2 most time intensive processes: creating the aggregates and transforming files to XLIFF for translation. what these process have in common... If I can stop holding the dbs when these run, I'm good.
I'm even considering processing all the small outputs to the file system and then import the result back once the process is over. Most operations would become read-only as far as BaseX is concerned... not my favorite approach, but it might do the trick...