Minutes for the RUS-WG Session OGF 20, Manchester, Wednesday, May 9th, 2007, 6:00-7:30pm, Exchange 1 Summary of the discussions around the slides: Slide 6: Should RUS support full XPath or should it restrict XPath? Full XPath compliance would be good because clients would not mystically get back failed requests without knowing why. Full XPath compliance also means that you have to be able to return parts of UsageRecords if the client selected only a part of a record. This is currently not allowed by the specification because it requires you to return whole URs only. XPath queries can be quite slow to execute and destroy any performance. This was the reason why we have actually removed all XPath queries from our RUS and only allow pre-canned queries via separate operations (extractByUser etc.). For performance I would recommend to put the extractBy... methods back into the specification. Also XPath queries are hard to map to SQL queries and RDBs are needed for performance. How is this handled by other people. The RUSQueryTooComplexFault (or equivalent) would give a server the chance to respond reasonably when it deems the query as being too complex or taking too much time to execute. Suggestion 2 (restrict XPath) would be covered by Suggestion 1 (support full XPath, allow RUSQueryTooComplexFault) since a server could restrict the allowed XPath queries by returning the fault. A possible solution to long queries could be to allow for a reply of style "I am too busy, come back later". This could also be used selectively to defer complex queries until low system load allows there handling. That could be hard to implement in case of many concurrent requests, and it would not solve the problem of many complex queries. One solution to the problem could be to require proper authorization to be allowed to execute complex queries. (e.g. the authorization decision also is based on the query). One solution would also be to have a method to return a catalog of allowed queries that the server is willing to process. A variant of this would be to have a operation to check if a server is willing to execute a query. This functionality could already be provided by the query operation, because it will tell you if the server rejected the query and return the result otherwise. A idea in conjunction with this would be to have a minimum catalog of simple queries that a instance has to execute to give clients a well known fall-back mechanism. In case of allowing partial URs to be returned, how can a client validate the result against the UR schema. This problem should also be tackled by the DAIS-WG since they also have to return query results. Check how they did address that problem. Notes to slide 9 (add a operation to query using a WS-Enumeration): OGSA-DAI is doing a very similar thing by buffering the result set in memory and allowing clients to return part of it at a time. This does however not help performance since the same amount of data is returned in the end. The main reason for this proposal was not performance but scalability, especially the size of the SOAP messages that are returned for large result sets. Some DBs allow you to use cursors to return results, WS-Enumeration would fit in nicely with that. WS-Enumeration is part of the WS-Transfer stack, in WS-RF you would create a new resource and use that to return the result in smaller packets. Can WS-Enumeration be used outside the WS-Transfer stack? A RUSReplyTooBig fault is a good thing to have, however it can be difficult to use, most implementation just behave very rude when they run out of memory (e.g. just go offline, or reply with authentication/authorization faults). One caution is that this is a very dynamic fault, so even a query with only one record could be a too big if another very large query is under processing. The operation could be tied together with the query operation if you allow for an extra parameter specifying the maximum size of the result and allowing the server to either return the result or a WS-Enumeration in case the result size exceeded the maximum. The important point here is that the server cannot make this decision unilaterally, the client must also be able to specify its limitations. Note on slide 10, audit information (1): Adding audit information for delete operations is good. Instead of recording the whole SOAP message and the cryptographic signature, you could get away with the ID since the URs are always stored in a trusted storage. Usually you also do not expose audit information via the Web-Services interface. A suggestion would be to put the audit information inside of the UsageRecord. The current createdBy and timestamp elements in the UR are meant to document the actual creation of the UsageRecord itself, not the insertion into a RUS. Probably audit information elements could be added to the UR. That would mix data and meta-data, which is not a good idea. extractRecordHistory is feasible. One should be aware of privacy issues associated with that operation. Not all users may be allowed to see the whole audit trail. It would be interesting to also consider reading as an audit operation. Make it optional for RUS that do not want to record it. Why do we need modification, what is the purpose? One use case is that the cost of a job might be calculated after the UR has been inserted into the RUS by a different pricing service. Another use case could be a service that maps local user-names to global user IDs after the URs have been generated on the local site, because that mapping was not available at the time of generation. Authorization issues (no slide): The problem with the old RUS authorization structure was that only two roles were defined (admin and regular user). This was too coarse grained. It would also be interesting to restrict user access on certain fields. This should be an implementation problem, the interface only needs you to be able to do authorization. In a GFD (see template) security should be addressed and for this application access control is very important. Describe the set of problems, the section is informative, not normative. Write about the problems in the security section, say that role based access control should be supported. Mandatory elements (slide 12+13): Currently the listMandatoryElements operation returns a UsageRecord document with empty elements as placeholder for the required ones. The proposal is to replace this return type with a list of QNames of the required elements. Instead of only returning a list of QNames, it would be interesting to also allow restrictions on the value of elements. One idea would be to return a XPath expression that needs to match for any UR stored in the RUS. This would provide great flexibility. If different constrains apply for different roles/users the operations may return different results depending on the role/id of the requester. This could also with advantage be realized using different endpoints to the same service. Advanced RUS Requirements (slide 14): For the inter-RUS synchronization/data-replication it would be interesting to check with the info-D group if they already have addressed similar problems. Another approach would be to offer a method to get all modifications since a given time. In this context, the federation of RUS under a top-level RUS would be an interesting use-case.