Some complementary notes (others may be able to tell you more about their experiences with large data sets):

a GiST index would have to be built there, to allow full-text searches; PostgreSQL is picked

You could as well have a look at Elasticsearch or its predecessors.

there might be a leak in the BaseX implementation of XQuery.

I assume you are referring to the SQL Module? Feel free to attach the OOM stack trace, it might give us more insight.

I would recommend you to write SQL commands or an SQL dump to disk (see the BaseX File Module for now information) and run/import this file in a second step; this is probably faster than sending hundreds of thousands of single SQL commands via JDBC, no matter if you are using XQuery or Java.