Minutes for the RUS-WG Session
OGF 20, Manchester, Wednesday, May 9th, 2007, 6:00-7:30pm, Exchange 1

Summary of the discussions around the slides:

Slide 6: Should RUS support full XPath or should it restrict XPath?

Full XPath compliance would be good because clients would not mystically get
back failed requests without knowing why. 

Full XPath compliance also means that you have to be able to return parts of
UsageRecords if the client selected only a part of a record. This is currently
not allowed by the specification because it requires you to return whole URs
only. 

XPath queries can be quite slow to execute and destroy any performance. This 
was the reason why we have actually removed all XPath queries from our RUS and
only allow pre-canned queries via separate operations (extractByUser etc.).
For performance I would recommend to put the extractBy... methods back into
the specification.
Also XPath queries are hard to map to SQL queries and RDBs are needed for
performance. How is this handled by other people.

The RUSQueryTooComplexFault (or equivalent) would give a server the chance to
respond reasonably when it deems the query as being too complex or taking too
much time to execute.

Suggestion 2 (restrict XPath) would be covered by Suggestion 1 (support full
XPath, allow RUSQueryTooComplexFault) since a server could restrict the
allowed XPath queries by returning the fault.

A possible solution to long queries could be to allow for a reply of style "I
am too busy, come back later". This could also be used selectively to defer
complex queries until low system load allows there handling.
That could be hard to implement in case of many concurrent requests, and it
would not solve the problem of many complex queries.

One solution to the problem could be to require proper authorization to be
allowed to execute complex queries. (e.g. the authorization decision also
is based on the query).

One solution would also be to have a method to return a catalog of allowed
queries that the server is willing to process. A variant of this would be
to have a operation to check if a server is willing to execute a query.
This functionality could already be provided by the query operation, because 
it will tell you if the server rejected the query and return the result 
otherwise.
A idea in conjunction with this would be to have a minimum catalog of simple
queries that a instance has to execute to give clients a well known fall-back
mechanism.

In case of allowing partial URs to be returned, how can a client validate the
result against the UR schema.
This problem should also be tackled by the DAIS-WG since they also have to
return query results. Check how they did address that problem.


Notes to slide 9 (add a operation to query using a WS-Enumeration):

OGSA-DAI is doing a very similar thing by buffering the result set in memory
and allowing clients to return part of it at a time. This does however not
help performance since the same amount of data is returned in the end.
The main reason for this proposal was not performance but scalability, 
especially the size of the SOAP messages that are returned for large result
sets.

Some DBs allow you to use cursors to return results, WS-Enumeration would fit
in nicely with that. 

WS-Enumeration is part of the WS-Transfer stack, in WS-RF you would create
a new resource and use that to return the result in smaller packets.
Can WS-Enumeration be used outside the WS-Transfer stack?

A RUSReplyTooBig fault is a good thing to have, however it can be difficult to
use, most implementation just behave very rude when they run out of memory
(e.g. just go offline, or reply with authentication/authorization faults).
One caution is that this is a very dynamic fault, so even a query with only one
record could be a too big if another very large query is under processing.

The operation could be tied together with the query operation if you allow
for an extra parameter specifying the maximum size of the result and allowing
the server to either return the result or a WS-Enumeration in case the result
size exceeded the maximum.
The important point here is that the server cannot make this decision 
unilaterally, the client must also be able to specify its limitations.


Note on slide 10, audit information (1):

Adding audit information for delete operations is good.

Instead of recording the whole SOAP message and the cryptographic signature,
you could get away with the ID since the URs are always stored in a trusted
storage.
Usually you also do not expose audit information via the Web-Services interface.

A suggestion would be to put the audit information inside of the UsageRecord.
The current createdBy and timestamp elements in the UR are meant to document
the actual creation of the UsageRecord itself, not the insertion into a RUS.
Probably audit information elements could be added to the UR.
That would mix data and meta-data, which is not a good idea.

extractRecordHistory is feasible.
One should be aware of privacy issues associated with that operation. Not all
users may be allowed to see the whole audit trail.

It would be interesting to also consider reading as an audit operation.
Make it optional for RUS that do not want to record it.


Why do we need modification, what is the purpose?
One use case is that the cost of a job might be calculated after the UR
has been inserted into the RUS by a different pricing service.
Another use case could be a service that maps local user-names to global
user IDs after the URs have been generated on the local site, because that
mapping was not available at the time of generation.


Authorization issues (no slide):

The problem with the old RUS authorization structure was that only two roles
were defined (admin and regular user). This was too coarse grained. It would
also be interesting to restrict user access on certain fields.

This should be an implementation problem, the interface only needs you to be
able to do authorization.

In a GFD (see template) security should be addressed and for this application
access control is very important. 
Describe the set of problems, the section is informative, not normative.
Write about the problems in the security section, say that role based
access control should be supported.


Mandatory elements (slide 12+13):

Currently the listMandatoryElements operation returns a UsageRecord document
with empty elements as placeholder for the required ones. The proposal is to
replace this return type with a list of QNames of the required elements.

Instead of only returning a list of QNames, it would be interesting to also
allow restrictions on the value of elements.

One idea would be to return a XPath expression that needs to match for any
UR stored in the RUS. This would provide great flexibility.

If different constrains apply for different roles/users the operations may
return different results depending on the role/id of the requester. 
This could also with advantage be realized using different endpoints to the
same service.


Advanced RUS Requirements (slide 14):

For the inter-RUS synchronization/data-replication it would be interesting
to check with the info-D group if they already have addressed similar problems.

Another approach would be to offer a method to get all modifications since a
given time.

In this context, the federation of RUS under a top-level RUS would be an 
interesting use-case.