Hi,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I'm getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%? I have tried tweaking the systems for BaseX 8, including turning off autoflush, and each setting that I try actually seems to increase the processing time. I also tried to use AddCache, and that just crashed the latest production release of the server.
Many thanks for your help in advance,
Dr Jonathan Clarke.
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
Hi Christian,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function. What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
Hi Jonathan,
A few months ago I needed to import XML documents that were over 50 Mb to BaseX. After a few attempts to speed the process I found that using Saxon's s9api and Xerces2 as shown below performed the best. The bottleneck appeared to not be in BaseX but actually in making the process of sending the data to BaseX efficient. Here is the Java code.
protected void loadXmlDocument(BaseXClient client, File xmlFile) throws Exception { DocumentBuilder docBuilder = sxProcessor.newDocumentBuilder(); SAXSource source = prepareSaxSource(xmlFile); XdmNode doc = docBuilder.build(source); try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Serializer ser = new Serializer(baos); ser.setOutputProperty(Serializer.Property.ENCODING, "UTF-8"); ser.serializeNode(doc); try (InputStream is = new ByteArrayInputStream(baos.toByteArray())) { client.replace(path, is); } } } protected SAXSource prepareSaxSource(File xmlFile) throws ParserConfigurationException, SAXException, MalformedURLException { SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setNamespaceAware(true); saxFactory.setXIncludeAware(true); saxFactory.setValidating(false); SAXParser saxParser = saxFactory.newSAXParser(); XMLReader reader = saxParser.getXMLReader();
CatalogResolver resolver = new CatalogResolver(catalogManager); reader.setEntityResolver(resolver);
SAXSource source = new SAXSource(); source.setInputSource(new InputSource(xmlFile.toURI().toURL().toExternalForm())); source.setXMLReader(reader);
return source; }
I tried to make the above code self-contained by cobbling together relevant parts of the code, so this is untested but carries the idea.
I hope this helps.
Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Jonathan Clarke Sent: Friday, March 13, 2015 3:50 PM To: 'Christian Grün' Cc: 'BaseX' Subject: Re: [basex-talk] Large Document Upload Performance
Hi Christian,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function. What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
Hi Vincent,
Many thanks for this. As you may see, I've just posted a response to Christian, with source that's pretty similar to yours already, aside from the libraries themselves. My findings suggest that it's a socket buffer problem, but I'll wait to hear what Christian says before replacing my implementation with your suggestions below.
Jonathan.
-----Original Message----- From: Lizzi, Vincent [mailto:Vincent.Lizzi@taylorandfrancis.com] Sent: 13 March 2015 21:30 To: Jonathan Clarke; 'Christian Grün' Cc: 'BaseX' Subject: RE: [basex-talk] Large Document Upload Performance
Hi Jonathan,
A few months ago I needed to import XML documents that were over 50 Mb to BaseX. After a few attempts to speed the process I found that using Saxon's s9api and Xerces2 as shown below performed the best. The bottleneck appeared to not be in BaseX but actually in making the process of sending the data to BaseX efficient. Here is the Java code.
protected void loadXmlDocument(BaseXClient client, File xmlFile) throws Exception { DocumentBuilder docBuilder = sxProcessor.newDocumentBuilder(); SAXSource source = prepareSaxSource(xmlFile); XdmNode doc = docBuilder.build(source); try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Serializer ser = new Serializer(baos); ser.setOutputProperty(Serializer.Property.ENCODING, "UTF-8"); ser.serializeNode(doc); try (InputStream is = new ByteArrayInputStream(baos.toByteArray())) { client.replace(path, is); } } } protected SAXSource prepareSaxSource(File xmlFile) throws ParserConfigurationException, SAXException, MalformedURLException { SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setNamespaceAware(true); saxFactory.setXIncludeAware(true); saxFactory.setValidating(false); SAXParser saxParser = saxFactory.newSAXParser(); XMLReader reader = saxParser.getXMLReader();
CatalogResolver resolver = new CatalogResolver(catalogManager); reader.setEntityResolver(resolver);
SAXSource source = new SAXSource(); source.setInputSource(new InputSource(xmlFile.toURI().toURL().toExternalForm())); source.setXMLReader(reader);
return source; }
I tried to make the above code self-contained by cobbling together relevant parts of the code, so this is untested but carries the idea.
I hope this helps.
Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Jonathan Clarke Sent: Friday, March 13, 2015 3:50 PM To: 'Christian Grün' Cc: 'BaseX' Subject: Re: [basex-talk] Large Document Upload Performance
Hi Christian,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function. What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
Hi Jonathan,
fBaseXClient.replace(fPathName, fXMLSource.getBytes());
It should probably look as follows?
fBaseXClient.replace(fPathName, fInputStream);
The following code snippet may be a bit faster...
import org.basex.io.in.ArrayInput; ... String xml = "<xml>...</xml>"; fBaseXClient.replace(fPathName, new ArrayInput(xml));
However, I assume that the bottleneck is not really BaseX, but rather the environment in which it is used.
Hope this helps, Christian
On Mon, Mar 16, 2015 at 11:55 AM, Jonathan Clarke jonathan.m.clarke@dsl.pipex.com wrote:
Hi Vincent,
Many thanks for this. As you may see, I've just posted a response to Christian, with source that's pretty similar to yours already, aside from the libraries themselves. My findings suggest that it's a socket buffer problem, but I'll wait to hear what Christian says before replacing my implementation with your suggestions below.
Jonathan.
-----Original Message----- From: Lizzi, Vincent [mailto:Vincent.Lizzi@taylorandfrancis.com] Sent: 13 March 2015 21:30 To: Jonathan Clarke; 'Christian Grün' Cc: 'BaseX' Subject: RE: [basex-talk] Large Document Upload Performance
Hi Jonathan,
A few months ago I needed to import XML documents that were over 50 Mb to BaseX. After a few attempts to speed the process I found that using Saxon's s9api and Xerces2 as shown below performed the best. The bottleneck appeared to not be in BaseX but actually in making the process of sending the data to BaseX efficient. Here is the Java code.
protected void loadXmlDocument(BaseXClient client, File xmlFile) throws Exception { DocumentBuilder docBuilder = sxProcessor.newDocumentBuilder(); SAXSource source = prepareSaxSource(xmlFile); XdmNode doc = docBuilder.build(source); try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Serializer ser = new Serializer(baos); ser.setOutputProperty(Serializer.Property.ENCODING, "UTF-8"); ser.serializeNode(doc); try (InputStream is = new ByteArrayInputStream(baos.toByteArray())) { client.replace(path, is); } } }
protected SAXSource prepareSaxSource(File xmlFile) throws ParserConfigurationException, SAXException, MalformedURLException { SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setNamespaceAware(true); saxFactory.setXIncludeAware(true); saxFactory.setValidating(false); SAXParser saxParser = saxFactory.newSAXParser(); XMLReader reader = saxParser.getXMLReader();
CatalogResolver resolver = new CatalogResolver(catalogManager); reader.setEntityResolver(resolver); SAXSource source = new SAXSource(); source.setInputSource(new InputSource(xmlFile.toURI().toURL().toExternalForm())); source.setXMLReader(reader); return source;
}
I tried to make the above code self-contained by cobbling together relevant parts of the code, so this is untested but carries the idea.
I hope this helps.
Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Jonathan Clarke Sent: Friday, March 13, 2015 3:50 PM To: 'Christian Grün' Cc: 'BaseX' Subject: Re: [basex-talk] Large Document Upload Performance
Hi Christian,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function. What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
It should probably look as follows? fBaseXClient.replace(fPathName, fInputStream);
Yes, sorry. That's my fault, I took the source from my investigative code. The original code was exactly as you've stated. I'll try the array input, thanks.
However, I assume that the bottleneck is not really BaseX, but rather the environment in which it is used.
The environment I'm using, though, isn't restrictive. It's a Windows 7 machine, running on an i7 with 8Gb of memory. Can you clarify what you mean by "environment?" I don't know how that assumption can be made on the face of the evidence so far. Can you confirm what the socket buffer size is on the server side, as my tests show a single write line being where all the timing is going. Can you point me to the server source where the socket read is being done to take the xml off the socket, please?
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 16 March 2015 11:27 To: Jonathan Clarke Cc: Lizzi, Vincent; BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
fBaseXClient.replace(fPathName, fXMLSource.getBytes());
It should probably look as follows?
fBaseXClient.replace(fPathName, fInputStream);
The following code snippet may be a bit faster...
import org.basex.io.in.ArrayInput; ... String xml = "<xml>...</xml>"; fBaseXClient.replace(fPathName, new ArrayInput(xml));
However, I assume that the bottleneck is not really BaseX, but rather the environment in which it is used.
Hope this helps, Christian
On Mon, Mar 16, 2015 at 11:55 AM, Jonathan Clarke jonathan.m.clarke@dsl.pipex.com wrote:
Hi Vincent,
Many thanks for this. As you may see, I've just posted a response to Christian, with source that's pretty similar to yours already, aside from the libraries themselves. My findings suggest that it's a socket buffer problem, but I'll wait to hear what Christian says before replacing my implementation with your suggestions below.
Jonathan.
-----Original Message----- From: Lizzi, Vincent [mailto:Vincent.Lizzi@taylorandfrancis.com] Sent: 13 March 2015 21:30 To: Jonathan Clarke; 'Christian Grün' Cc: 'BaseX' Subject: RE: [basex-talk] Large Document Upload Performance
Hi Jonathan,
A few months ago I needed to import XML documents that were over 50 Mb to BaseX. After a few attempts to speed the process I found that using Saxon's s9api and Xerces2 as shown below performed the best. The bottleneck appeared to not be in BaseX but actually in making the process of sending the data to BaseX efficient. Here is the Java code.
protected void loadXmlDocument(BaseXClient client, File xmlFile) throws Exception { DocumentBuilder docBuilder = sxProcessor.newDocumentBuilder(); SAXSource source = prepareSaxSource(xmlFile); XdmNode doc = docBuilder.build(source); try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Serializer ser = new Serializer(baos); ser.setOutputProperty(Serializer.Property.ENCODING, "UTF-8"); ser.serializeNode(doc); try (InputStream is = new ByteArrayInputStream(baos.toByteArray())) { client.replace(path, is); } } }
protected SAXSource prepareSaxSource(File xmlFile) throws ParserConfigurationException, SAXException, MalformedURLException { SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setNamespaceAware(true); saxFactory.setXIncludeAware(true); saxFactory.setValidating(false); SAXParser saxParser = saxFactory.newSAXParser(); XMLReader reader = saxParser.getXMLReader();
CatalogResolver resolver = new CatalogResolver(catalogManager); reader.setEntityResolver(resolver); SAXSource source = new SAXSource(); source.setInputSource(new InputSource(xmlFile.toURI().toURL().toExternalForm())); source.setXMLReader(reader); return source;
}
I tried to make the above code self-contained by cobbling together relevant parts of the code, so this is untested but carries the idea.
I hope this helps.
Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Jonathan Clarke Sent: Friday, March 13, 2015 3:50 PM To: 'Christian Grün' Cc: 'BaseX' Subject: Re: [basex-talk] Large Document Upload Performance
Hi Christian,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function. What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
Can you point me to the server source where the socket read is being done to take the xml off the socket, please?
Sure. Here, the input stream is requested (and wrapped into a buffered input stream):
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
Hope this helps, Christian
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 16 March 2015 11:27 To: Jonathan Clarke Cc: Lizzi, Vincent; BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
fBaseXClient.replace(fPathName, fXMLSource.getBytes());
It should probably look as follows?
fBaseXClient.replace(fPathName, fInputStream);
The following code snippet may be a bit faster...
import org.basex.io.in.ArrayInput; ... String xml = "<xml>...</xml>"; fBaseXClient.replace(fPathName, new ArrayInput(xml));
However, I assume that the bottleneck is not really BaseX, but rather the environment in which it is used.
Hope this helps, Christian
On Mon, Mar 16, 2015 at 11:55 AM, Jonathan Clarke jonathan.m.clarke@dsl.pipex.com wrote:
Hi Vincent,
Many thanks for this. As you may see, I've just posted a response to Christian, with source that's pretty similar to yours already, aside from the libraries themselves. My findings suggest that it's a socket buffer problem, but I'll wait to hear what Christian says before replacing my implementation with your suggestions below.
Jonathan.
-----Original Message----- From: Lizzi, Vincent [mailto:Vincent.Lizzi@taylorandfrancis.com] Sent: 13 March 2015 21:30 To: Jonathan Clarke; 'Christian Grün' Cc: 'BaseX' Subject: RE: [basex-talk] Large Document Upload Performance
Hi Jonathan,
A few months ago I needed to import XML documents that were over 50 Mb to BaseX. After a few attempts to speed the process I found that using Saxon's s9api and Xerces2 as shown below performed the best. The bottleneck appeared to not be in BaseX but actually in making the process of sending the data to BaseX efficient. Here is the Java code.
protected void loadXmlDocument(BaseXClient client, File xmlFile) throws Exception { DocumentBuilder docBuilder = sxProcessor.newDocumentBuilder(); SAXSource source = prepareSaxSource(xmlFile); XdmNode doc = docBuilder.build(source); try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Serializer ser = new Serializer(baos); ser.setOutputProperty(Serializer.Property.ENCODING, "UTF-8"); ser.serializeNode(doc); try (InputStream is = new ByteArrayInputStream(baos.toByteArray())) { client.replace(path, is); } } }
protected SAXSource prepareSaxSource(File xmlFile) throws ParserConfigurationException, SAXException, MalformedURLException { SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setNamespaceAware(true); saxFactory.setXIncludeAware(true); saxFactory.setValidating(false); SAXParser saxParser = saxFactory.newSAXParser(); XMLReader reader = saxParser.getXMLReader();
CatalogResolver resolver = new CatalogResolver(catalogManager); reader.setEntityResolver(resolver); SAXSource source = new SAXSource(); source.setInputSource(new InputSource(xmlFile.toURI().toURL().toExternalForm())); source.setXMLReader(reader); return source;
}
I tried to make the above code self-contained by cobbling together relevant parts of the code, so this is untested but carries the idea.
I hope this helps.
Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Jonathan Clarke Sent: Friday, March 13, 2015 3:50 PM To: 'Christian Grün' Cc: 'BaseX' Subject: Re: [basex-talk] Large Document Upload Performance
Hi Christian,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function. What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
Hi Jonathan,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function.
Could you please post the lines of code you have been using so far to replace documents?
Beside that, you could check out the e-mail from Simon Chatelain [1]: In many cases, you can e.g. speed up the import of documents by using our internal parser (INTPARSE = true).
Best, Christian
[1] https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg05911.htm...
What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
Hi Christian,
Please find below the code I use. As you can see, I make a call to the BaseXClient found at the Github location, which I leave unchanged: https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/or.... I have spent some time breaking down the "send" function in the BaseXClient, and have tracked the delay down to the "bos.write" which is attached to the raw socket created in the constructor. I also tested the function using a byte array, rather than a single byte, and the problem persists, suggesting a socket buffering problem on the server side itself. I increased the buffer size in the BaseXClient socket, and that made no difference either. The XML datafiles that I'm sending vary, so it's not associated with a single structure within one of them. I also tried the internal parser, but that made no difference, and, if it's the lower level server buffers, anyway, I wouldn't have thought that would make any difference either.
-----------------------------------
// Document builder initialisation DocumentBuilderFactory fDocumentBuilderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder fDocumentBuilder = fDocumentBuilderFactory.newDocumentBuilder(); DOMImplementation fDOMImplementation = fDocumentBuilder.getDOMImplementation();
// Generate XML from select object fields. Document fDocument = fDOMImplementation.createDocument(null, this.getClass().getName(), null); Element fElement = fDocument.getDocumentElement(); fMyObject.toXML(fDocument, fElement);
// Do transformation DOMSource fDomSource = new DOMSource(fDocument); StringWriter fStringWriter = new StringWriter(); StreamResult fStringStreamResult = new StreamResult(fStringWriter); fTransformer = fTransformerFactory.newTransformer(); fTransformer.setOutputProperty(OutputKeys.ENCODING,"ISO-8859-1"); fTransformer.transform(fDomSource, fStringStreamResult);
String fXMLSource = fStringWriter.toString();
// BaseXClient Start final BaseXClient fBaseXClient = new BaseXClient(fHost, fPort, fUserName, fPassword); try { fBaseXClient.execute("open " + fIdentity); doSignal(fBaseXClient.info()); InputStream fInputStream = new ByteArrayInputStream(fXMLSource.getBytes()); fBaseXClient.replace(fPathName, fXMLSource.getBytes()); doSignal(fBaseXClient.info()); } finally { fBaseXClient.close(); } // BaseXClient Finish.
-----------------------------------
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 22:30 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function.
Could you please post the lines of code you have been using so far to replace documents?
Beside that, you could check out the e-mail from Simon Chatelain [1]: In many cases, you can e.g. speed up the import of documents by using our internal parser (INTPARSE = true).
Best, Christian
[1] https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg05911.htm...
What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian
basex-talk@mailman.uni-konstanz.de