It should probably look as follows? fBaseXClient.replace(fPathName, fInputStream);
Yes, sorry. That's my fault, I took the source from my investigative code. The original code was exactly as you've stated. I'll try the array input, thanks.
However, I assume that the bottleneck is not really BaseX, but rather the environment in which it is used.
The environment I'm using, though, isn't restrictive. It's a Windows 7 machine, running on an i7 with 8Gb of memory. Can you clarify what you mean by "environment?" I don't know how that assumption can be made on the face of the evidence so far. Can you confirm what the socket buffer size is on the server side, as my tests show a single write line being where all the timing is going. Can you point me to the server source where the socket read is being done to take the xml off the socket, please?
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 16 March 2015 11:27 To: Jonathan Clarke Cc: Lizzi, Vincent; BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
fBaseXClient.replace(fPathName, fXMLSource.getBytes());
It should probably look as follows?
fBaseXClient.replace(fPathName, fInputStream);
The following code snippet may be a bit faster...
import org.basex.io.in.ArrayInput; ... String xml = "<xml>...</xml>"; fBaseXClient.replace(fPathName, new ArrayInput(xml));
However, I assume that the bottleneck is not really BaseX, but rather the environment in which it is used.
Hope this helps, Christian
On Mon, Mar 16, 2015 at 11:55 AM, Jonathan Clarke jonathan.m.clarke@dsl.pipex.com wrote:
Hi Vincent,
Many thanks for this. As you may see, I've just posted a response to Christian, with source that's pretty similar to yours already, aside from the libraries themselves. My findings suggest that it's a socket buffer problem, but I'll wait to hear what Christian says before replacing my implementation with your suggestions below.
Jonathan.
-----Original Message----- From: Lizzi, Vincent [mailto:Vincent.Lizzi@taylorandfrancis.com] Sent: 13 March 2015 21:30 To: Jonathan Clarke; 'Christian Grün' Cc: 'BaseX' Subject: RE: [basex-talk] Large Document Upload Performance
Hi Jonathan,
A few months ago I needed to import XML documents that were over 50 Mb to BaseX. After a few attempts to speed the process I found that using Saxon's s9api and Xerces2 as shown below performed the best. The bottleneck appeared to not be in BaseX but actually in making the process of sending the data to BaseX efficient. Here is the Java code.
protected void loadXmlDocument(BaseXClient client, File xmlFile) throws Exception { DocumentBuilder docBuilder = sxProcessor.newDocumentBuilder(); SAXSource source = prepareSaxSource(xmlFile); XdmNode doc = docBuilder.build(source); try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Serializer ser = new Serializer(baos); ser.setOutputProperty(Serializer.Property.ENCODING, "UTF-8"); ser.serializeNode(doc); try (InputStream is = new ByteArrayInputStream(baos.toByteArray())) { client.replace(path, is); } } }
protected SAXSource prepareSaxSource(File xmlFile) throws ParserConfigurationException, SAXException, MalformedURLException { SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setNamespaceAware(true); saxFactory.setXIncludeAware(true); saxFactory.setValidating(false); SAXParser saxParser = saxFactory.newSAXParser(); XMLReader reader = saxParser.getXMLReader();
CatalogResolver resolver = new CatalogResolver(catalogManager); reader.setEntityResolver(resolver); SAXSource source = new SAXSource(); source.setInputSource(new InputSource(xmlFile.toURI().toURL().toExternalForm())); source.setXMLReader(reader); return source;
}
I tried to make the above code self-contained by cobbling together relevant parts of the code, so this is untested but carries the idea.
I hope this helps.
Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Jonathan Clarke Sent: Friday, March 13, 2015 3:50 PM To: 'Christian Grün' Cc: 'BaseX' Subject: Re: [basex-talk] Large Document Upload Performance
Hi Christian,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function. What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian