On Jun 24, 2013, at 10:02 PM, Christian Grün wrote:
I'm contemplating the construction of an interface for advanced or dedicated users of a database, with a text box in which they type their queries as XQuery modules. (Non-advanced and non-dedicated users will make do with a variety of pre-defined queries; this interface is intended to provide an open-ended query interface for the few users who will need it.)
If possible, xquery:eval() should be avoided for such operations (we may eventually rename it to evil()).
That would certainly catch the developer's attention!
The solution which you find on our homepage [1] is based on our REST interface, and a user whose permissions are restricted to reading the example databases. This way, queries like "file:list('.')" will be rejected.
Can you provide more information on how this is implemented on the BaseX site? I can see how the user's query string can be wrapped in a rest:query element (and even how I can set the context for them), and submitted to the server in the normal way. Is that what is happening in the BaseX demo interface? (Is the source code available somewhere? I don't see an obvious place to look for it in the GitHub repository.)
I had been envisaging something like
import module namespace cqi := "http://example.org/corpus-query-interface"; declare variable $cqi:query as xs:string external;
if (cqi:nanny-says-ok($cqi:query)) then xquery:eval($query) else <error>That query goes too far!</error>
My idea was that the cqi:nanny-says-ok() function can do some simple vetting of the query to weed out constructs like calls to doc('file:///etc/passwd') but allow calls to doc() for documents on other Web sites. I was worried about the rest:query interface: I can make my PHP proxy do all the checking I would have done with cqi:nanny-says-ok(), but I can't prevent an adversary from sending an HTTP request directly to the BaseX server and bypassing the PHP proxy -- so I wanted to do my checking in XQuery.
It may be that having queries run under an appropriately restricted user provides all the security I need. A user with read-only access cannot modify the database (and also cannot use doc(), as I've just discovered), and that protects against the primary risk I am concerned with.
The query timeout (which doesn’t apply to admin queries [2]) has been set to 10 seconds. There is currently no way to restrict memory resources in this demo, because the query will run in the same virtual machine as the server instance. One solution could be to start a new BaseX (server) instance with limited memory (-Xmx).
This is very helpful; I should have remembered the TIMEOUT option, but didn't. It protects me against the second risk I'm worried about. On memory usage I will just take my chances for now.