Hi,
I am having difficulties with populating BASEX database. I have plenty of
XML files (~ half a million, with various sizes ranging from several
kilobytes up to hundred of kilobytes).
I use BASEX Java API and finally I call for each file
org.basex.core.cmd.Add.
I am trying to import them into BASEX database, in fact there are 22 types
of files (22 XSD definitions) the files conform to, so I have 22 different
databases in a single BASEX server.
I have plenty of RAM and CPU power and I monitor the process (both -- the
BASEX server and my client program) from within JVisualVM, the JVM reaches
the CPU boundaries, but RAM is never exhausted.
Before importing, I need to enhance the XML data with some additional
information taken from SQL database.
I have written a Groovy multithreaded program that uses BASEX Java API with
heavy use of GPars library. Simply put, the program:
1. has several producer threads -- each producer reads given portion of the
database and provides those additional information
2. has several consumer threads -- each consumer takes the original files,
wraps it with additional information and finally calls
org.basex.core.cmd.Add command.
Various testing with less data (upto ~ several thousands of files) provides
good results -- no loss of data, BASEX server and my client program behaves
as it should.
Unfortunately when trying to import all of the files, the program starts
fine, but when it gets "warm" I got SIGPIPE errors in log from time to time
(as I said, there is plenty of RAM and CPU available) (see attachment
please).
Comments to picture:
1. I am adding document with ID ISPOP_166007 -- this ID is indeed missing
in the final database
2. just simple call to Add:
Closure add = { session ->
def cmd = new org.basex.core.cmd.Add(dsn, enhancedXml)
session.execute(cmd)
}
3. I am reusing the session, the session is bound to current thread and
never gets closed until the thread (consumer) finishes
There is nothing wrong in BASEX server log, other documents are added just
fine, there is no trace about document ISPOP_166007.
Just for reference the complete stack trace follows:
- - - -
ERROR basex.support.AddResourcesSupport - unable to consume ISPOP_166007
java.net.SocketException: Roura přerušena (SIGPIPE)
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_45]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_45]
at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_45]
at org.basex.io.out.BufferOutput.flush(BufferOutput.java:60) ~[basex-8.2.jar!/:8.2]
at org.basex.io.out.BufferOutput.write(BufferOutput.java:54) ~[basex-8.2.jar!/:8.2]
at org.basex.io.out.PrintOutput.write(PrintOutput.java:66) ~[basex-8.2.jar!/:8.2]
at java.io.OutputStream.write(OutputStream.java:116) ~[na:1.8.0_45]
at java.io.OutputStream.write(OutputStream.java:75) ~[na:1.8.0_45]
at org.basex.api.client.ClientSession.send(ClientSession.java:238) ~[basex-8.2.jar!/:8.2]
at org.basex.api.client.ClientSession.execute(ClientSession.java:160) ~[basex-8.2.jar!/:8.2]
at org.basex.api.client.ClientSession.execute(ClientSession.java:167) ~[basex-8.2.jar!/:8.2]
at org.basex.api.client.Session.execute(Session.java:36) ~[basex-8.2.jar!/:8.2]
at org.basex.api.client.Session$execute.call(Unknown Source) ~[na:na]
at basex.support.AddResourcesSupport$_consume_closure9$_closure17.doCall(AddResourcesSupport.groovy:255) ~[basex-1.0.jar!/:na]
at sun.reflect.GeneratedMethodAccessor368.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45]
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4]
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4]
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4]
at basex.BasexSessionRegistry.withThreadBoundSession(BasexSessionRegistry.groovy:79) ~[basex-1.0.jar!/:na]
at basex.BasexSessionRegistry$withThreadBoundSession$0.call(Unknown Source) ~[na:na]
at basex.support.AddResourcesSupport$_consume_closure9.doCall(AddResourcesSupport.groovy:257) ~[basex-1.0.jar!/:na]
at sun.reflect.GeneratedMethodAccessor327.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45]
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4]
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4]
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.callsite.BooleanReturningMethodInvoker.invoke(BooleanReturningMethodInvoker.java:51) ~[groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.callsite.BooleanClosureWrapper.call(BooleanClosureWrapper.java:53) ~[groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.DefaultGroovyMethods.find(DefaultGroovyMethods.java:3908) ~[groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.dgm$191.invoke(Unknown Source) ~[groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274) ~[groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) ~[groovy-2.4.4.jar!/:2.4.4]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4]
at basex.support.AddResourcesSupport.consume(AddResourcesSupport.groovy:251) ~[basex-1.0.jar!/:na]
- - - -
Best Regards,
Martin