Hi
This topic was mentioned few times before, but it is key aspect of my application, so I will try anyway.
Question:
What is recommended approach for application, where exists multiple document producers which shall store/modify documents in one, share collection.
Background
- Having feeds, which might deliver some new messages.
- Multiple preprocessing processes - preprocessing the messages before storing them in collection (Python, celery and RabbitMQ to be ready for processing bigger amount of incoming messages in parallel)
- Having single process for write access to my BaseX collection (arranging this is not very easy for me so far)
- There are some read only queries for stored messages, there are no problems with this as it shall be easy to run multiple read-only queries concurrently.
Assumptions
If I am wrong here, please, correct me.
- one collection can be written/modified only by one process at given moment, other attempts to write are rejected and these attempts will fail.
In other words - one collection can be opened only by one process (or are multiple opening clients allowed as long as the do not modify database concurrently?) - read only queries are easy as they can be run concurrently without blocking/rejecting
Current approach
- Multiple preprocessing processes are concurrently sending documents into single basex-storage queue in RabbitMQ
- There is single one-process worker, consuming documents from queue basex-storage which does all the work with adding documents for given collection.
Wishes
Managing single write process adds quite complexity to whole solution.
It would be nice, if I can write to single collection by multiple processes at once and could expect success.
Any comments are welcome
Best regards
Jan