High cpu utilization

List overview All Threads
Download

newer

older

Basex collection size

sort bug

Sateesh

12 Jul 2013 12 Jul '13

8:06 a.m.

Hi Basex,

We have written a xquery which runs on a collection containing 600 xml files and the size of collection is 670MB.when we execute the xquery the CPU utilization is going up to 30-40% and as the number of hits that comes for executing this queries at a time will be around 20-40 because of high CPU utilzation for executing one query the processing speed of the remaning queries is getting delayed.

Could you please check is some thing wrong in the query or do you have some suggestion which could improve the performance.Please find attachement of query and sample xml file.

Thanks & regards

Sateesh.A

Attachments:

attachment.html (text/html — 2.4 KB)
Sample.zip (application/octet-stream — 3.9 KB)

Show replies by date

Christian Grün

12 Jul 12 Jul

8:44 a.m.

Hi Sateesh,

I’m not sure about the details of your query (e.g. I wonder if you really want to check if the string ACH99223852 contains the order number, which feels a little bit weird), but the attached query may do what you need. Do you store all your documents in a database, or is this no option?

Christian ___________________________________

let $no := 'ACH99223852' let $itemized := doc('Sample.xml')/SUBCUSTBRK[ORDRNO/@NO = $no]/ITEMIZED/TRANSACTION[@Type="OrderCharges"]/ORDER_TYPES[@TYPE="Order to Mobile"]/DATA/R return <DOCUMENT>{ for $row in $itemized return <R> { $row/* } <ORDRNO>{ $no }</ORDRNO> </R> }</DOCUMENT>

Sateesh

15 Jul 15 Jul

7:37 a.m.

Hi Christian,

Thank You so much Christian for your reply,We modified our query as per your suggestion and we have seen reduction in cpu utilization by 5 %.Modified queries and xmls are attached to this mail.

We will try to explain you the query which we thought of writing is "We have created a collection by adding multiple xml documents and on those documents I need all the itemized data(data between <R></R> tags under following xpath /SUBCUSTBRK/ITEMIZED/TRANSACTION/ORDER_TYPES there will be multiple TRANSACTION tags with in the same xml file but we dont want data from all the TRANSACTION tags we want it only from 'type' attribute which matches our selection and after selecting one transaction with in transaction also we want data only from specific order_types whose 'type' attribute matches our selection)".

Is there a way that can reduce the CPU utilization by some more extent by modifying the attached query.In the attachment we have aaded two sample xml files and query.

Also could you please tell us the optimization techniques that can be followed while writing queries,if you can share some URL or book which would help us in writing optimized queries it would really help us.

Answer for your last question in the below mail is, Yes we are adding all our xml documents into a database(collection).

Thanks & regards Sateesh.A

-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Friday, July 12, 2013 6:15 PM To: Sateesh Cc: BaseX Subject: Re: [basex-talk] High cpu utilization

Hi Sateesh,

Im not sure about the details of your query (e.g. I wonder if you really want to check if the string ACH99223852 contains the order number, which feels a little bit weird), but the attached query may do what you need. Do you store all your documents in a database, or is this no option?

Christian ___________________________________

Christian Grün

10 a.m.

As indicated in my last mail, the contains() function is probably not best choice both in terms of correctness and performance. Instead, you should switch to the equality operator, which can often be rewritten to index access:

let $no := ('ACH99223853', 'ACH99223852') let $subbrk := collection('Sample')/SUBCUSTBRK[ORDRNO/@NO = $no] ...

To ensure that the index is used, please check out the output in the GUI InfoView (or use -V on command line), and ensure that the attribute index of your database is up-to-date (Database → Properties…). ___________________________

2013/7/15 Sateesh sateesh@intense.in:

...

Hi Christian,

Thank You so much Christian for your reply,We modified our query as per your suggestion and we have seen reduction in cpu utilization by 5 %.Modified queries and xmls are attached to this mail.

We will try to explain you the query which we thought of writing is "We have created a collection by adding multiple xml documents and on those documents I need all the itemized data(data between <R></R> tags under following xpath /SUBCUSTBRK/ITEMIZED/TRANSACTION/ORDER_TYPES there will be multiple TRANSACTION tags with in the same xml file but we don’t want data from all the TRANSACTION tags we want it only from 'type' attribute which matches our selection and after selecting one transaction with in transaction also we want data only from specific order_types whose 'type' attribute matches our selection)".

Is there a way that can reduce the CPU utilization by some more extent by modifying the attached query.In the attachment we have aaded two sample xml files and query.

Also could you please tell us the optimization techniques that can be followed while writing queries,if you can share some URL or book which would help us in writing optimized queries it would really help us.

Answer for your last question in the below mail is, Yes we are adding all our xml documents into a database(collection).

Thanks & regards Sateesh.A

-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Friday, July 12, 2013 6:15 PM To: Sateesh Cc: BaseX Subject: Re: [basex-talk] High cpu utilization

Hi Sateesh,

I’m not sure about the details of your query (e.g. I wonder if you really want to check if the string ACH99223852 contains the order number, which feels a little bit weird), but the attached query may do what you need. Do you store all your documents in a database, or is this no option?

Christian ___________________________________

let $no := 'ACH99223852' let $itemized := doc('Sample.xml')/SUBCUSTBRK[ORDRNO/@NO = $no]/ITEMIZED/TRANSACTION[@Type="OrderCharges"]/ORDER_TYPES[@TYPE="Order to Mobile"]/DATA/R return <DOCUMENT>{ for $row in $itemized return <R> { $row/* } <ORDRNO>{ $no }</ORDRNO> </R> }</DOCUMENT>

4386

Age (days ago)

4389

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

3 comments

2 participants

tags (0)

participants (2)

Christian Grün
Sateesh