DAIS Session 1 - GGF12 - 21/09/04 ================================= Chair: : Dave Pearson Note Taker: Mario Antonioletti Agenda: 1. Introduction. Review of Activities to Date: Dave Pearson 2. Overview of the DAIS Specifications: Norman Paton 3. Discussion of issues with DAIS Specifications Logged in GridForge --- Dave Pearson opens the proceedings. See the slides at: http://forge.gridforum.org/projects/dais-wg/document/ggf12-dais-session1-intro/en/1 Gives a short overview of DAIS. DAIS started in GGF4. Promote standards for database services in Grid environments. Concentrating on DBMSs for now. Since GGF11 the specifications have been stabilising. Now focusing on issue resolution. Finalising DAIS mappings. Have had on going collaborations with various other groups. More points on the slides. Have two further sessions at this GGF also have a joint session with GSM. --- Norman Paton gives an overview of the DAIS specifications. See the slides at: http://forge.gridforum.org/projects/dais-wg/document/ggf-dais-session1-overview/en/1 Have had a stable membership of the core group for the past few GGFs. Not a closed group so you can join if you wish. Have a base spec that is specialised by various sub specs. WS-DAI will depend on several web services standards. The DAIS specs are going to try to be complementary to other Grid specifications. The WS-DAI spec is informed by the OGSA specification and a "Scenarios for Mapping DAIS Concepts". Between GGF11 and GGF12 have been trying to go through a number of issues. There has been an implementation activity done as IBM Hursley as part of the OGSA-DAI Activity. From a mappings perspective close to finalising the mappings document. Have been contributing to the OGSA Data Design Team. In terms of terminology we have Data Service, Data Resource, Consumer, Data Set, Data Description, Data Access, Data Factory, Data Management - actual definitions are in the slides. In terms of Data Description there is very little in the top level specs but both the relational and XML specs provide elements for the data description. Working with CGS to extend the CIM model for the relational data description. For Data Access the interfaces expose the functionalities presented by the underlying data resource. One of the thing in the top level spec attempts to capture the semantics of the operations through "behavioural properties". You may be able to set the value or query the properties of the interface. Various examples are included in the slides. We have a top level spec and the realisations - with a view to trying to have consistent message exchange patterns provided in the top level spec. This pattern is supposed to be followed by the realisations. The response types that a service can provide are given in a behavioural property specified in the service. The factory pattern satisfies some of access patterns described early on in the DAIS effort. Not all of these are satisfied by the current DAIS specs - some of these are supposed to be handled by other specs like the infod GGF WG. In the same way that we have a template for data access there is a similar template for the data factory pattern. The realisations implement this template. Data Management - we are not doing anything in this space. The themes for this however are managing the resource, service, service-resource relationship. Hope to use the WSDM activity being done in OASIS to do this. Mapping to WSDL - in the specs talk about the concepts in a decoupled manner from the underlying WSDL mapping. WSRF has not stabilised as yet and similarly there is on-going work trying to come up with an OGSA naming scheme which may affect the interfaces specified in the WSDL. Have produced an information document to discuss the ways that DAIS could be mapped to WSDL. Currently have a mapping to WSRF spec. The presentation finishes with a discussion of other relevant sessions to DAIS that are taking place at GGF. --- Simon Laws/Brian Collins/Amy Krause on outstanding issues on the DAIS specs. Simon: See the slides at: http://forge.gridforum.org/projects/dais-wg/document/ggf-dais-session1-overview/en/1 Have taken the issues currently on Grid Forge and have put them on the slides. This list is not exhaustive of the issues that are there. Have listed each issue for each of the specifications. Have only chosen a couple of these issues from the lists to discuss at this meeting. Going to talk about: - 1103: Top level default data access - 1099: Naming - 977: Disconnected Data Sets (the number refers to the artifact number in the WS-DAI tracker). On the first issue: Have a core specification that is extended by the realisations. In the top level spec there are no top level access operations. In a previous version of the spec we did have some basic operations get/put which were somewhat vague so we took them out. However, since then it has been pointed out that there should be some input/output operations that would be common across all the realisations. Do people feel that this would be a good or a bad thing? or something that could be specialised by the realisation specs? Guy Rixon: if you want to build a replicating system that copies data from one data resource to another it would be very handy to have that kind of abstraction. I would concentrate on replication rather than pure I/O. Suppose you want to copy a table in a database - having the ability to do that at both ends would be really useful. As a part of the collaboration of services to move data from end to the other? Is there a value for clients to access data in a general way? A get mechanism or some easily specialised mechanisms ... Owen Synge: Learning from the example of compute services if you provide the ability to distinguish between the different data sources then you pass some of the complexity of the underlying data formats to the clients. Is there some underlying XSLT or SQL to do this? We did not try to use a common query language .. Owen: I think it would be a nice thing to have as you would not have to pass on the complexity of the data formats. ... do not have a feeling as to what that format should be. Norman: whose responsibility is that? A common interface that wraps different data resources ... DAIS could do that but the top level would need to provide a veneer that supports one particular language and format. This is not something that would surface in the specs. We could write another specialisation that would do this. ??: is it not an engineering compromise as to whether you get the full expression and you take a performance penalty because of this? Should probably start with the simplest things and build on that. ... would these operations be synchronous/asynchronous and if you had an asynchronous get and put would you be guaranteed to get the responses in the right order? I think that a synchronous only version of these operations would be very bad because of the performance penalty that would incur. Would hope to be able to support both types of operations. Allow multiple consumer accesses. Scott Oster: without the concept of document identifiers a generic get and put is not that useful. You may not know the return type. If you could not identify what was returned then that would not be that useful. Dave Pearson: one of the reasons why we thought it was worth exploring this high level input/output operations is that in a commercial environment you do not get access to the internal queries ... if someone wants to make only some of their data available. I find it difficult to see how you could implement this using a DAIS based service which expects you to supply the query ... the data suppliers will want to deliver information on their own terms ... you can return information in a given format but I will not tell you how this information is stored in the database. The main motivation would be to make it more relevant to the commercial world. Guy: I can see how you would have a get where the query would be a named parameter but how would you deal with the return format? Dave: the format on which you got the data back would be dependent on how you got that data back ... you could pass that information as a parameter. Savas Parastatidis: this is very similar to a RESTful approach where you use an http GET and you don't know what the return type is. Microsoft has just released a specification called WS-Transfer that does this... [See: http://msdn.microsoft.com/library/en-us/dnglobspec/html/ws-transfer.pdf] We should investigate this. Guy: if you have a domain specific query - you could do your own realisation or you could pass that through one of your interfaces. Is this a good place to put that kind of customisation? Norman: sounds like a plausible way of doing this. A request query would usually have some additional metadata which the service would publish. You could associate a given type of query with a given type of response. You could describe the behavioural properties with the behaviour that you want to exhibit. You expect a bit of baggage that would allow this. There would be to be some additional properties ... would need to specify the allowed input formats. ... Savas: WS-Transfer is based on a particular RESTful approach called CRUDE. In a way we have said that these are both interesting approaches. It looks like both suggested approaches are valid. We will document the findings against this fault. We encourage to contribute there now. It's not something that we have a desperate need for now ... Norman: ... but we have had requests for a number of bodies for this so we should look at it ... Scott: you would still be able to use SQL/XML and other additional query languages? yes. Next issue we have is number 1099 on Naming. OGSA is introducing its new 3 level naming scheme. This is not sorted yet. There is still an on-going discussion. The intention is to use an existing naming scheme that satisfies the OGSA three level naming scheme. Up to this point though we have been working in the address level and have not said anything about the name other than a property of the name other than in the data description. When we thought about this we though of it in terms of it being akin to the abstract name but have not specified a format. Has anyone any thoughts? Savas: in the OGSA naming scheme do you have to provide all three? I think the answer is no. There is no reason why you should not only be able to provide a subset of entries for names. Savas: could imagine an architecture where you only have human and abstract names and the addresses come from a registry and you never see the addresses. yes. The argument is what is the most appropriate way to do this for DAIS. Norman: has there been any follow through as to how the OGSA based specs should operate? No, they have not got that far. Want to know what the requirements are in order to push them in the appropriate direction. They were talking about the abstract name in terms of it being an EPR which somewhat surprised me. I asked what happens if you use a journal name ... they have taken this on board. Now is the time to get requirements. ??: this is too abstract for my feeling? What purpose do they serve? That's my question too. Guy: the one concrete point is what do you do about hierarchical naming inside DAIS? you may want to use the human names and you may want the services to understand that name. Savas: for every specific naming scheme though you will have to come up with your own solution. Guy: you could impose your naming. Greg Riccardi: are we naming data objects or services? The name of objects inside services are already there. If I put DAIS on top of file system I will use the names that are already there. The person that passes the query has to be aware of the underlying scheme. If I issue a query and I get a service back then you would be naming the service... that's what I thought we were talking about ... Guy: supposed I recall the results of a query in a virtual file system ... I want to use a file name for a service to access that data .... Greg: don't want to use the service to expose the naming scheme ... you get the data out ... if you want to keep the data you give that to another service ... Guy: you need to know how to hold the handle ... Greg: is it not you that associates a name with this ... ??: I don't see why you would limit the capabilities of what you want with a service.... I don't see why you want to mix them ... Guy: perhaps they don't have to be ... you may be right ... there is still the case where you could have an internal scheme ... We are not trying to expose that in this discussion ... this could be included in your human or abstract ... Scott: would that not be something internal to the service ... Guy: what comes back is the name of a service ... then that is good... I expected people to come back with good ways to name data. ... Savas: another way of seeing this is whether it is important to DAIS as a means of accessing data ... a way of accessing data or whether this is a name given to the data.... Whether the names are outside of the service interfaces ... clearly important to know what DAIS are doing .... Given the time will let Brian Collins come in to say something about the relational issues. Brian: for relational we do not have any major issues. Will go through top level issues. If you want to raise an issue it would be helpful if you could also add the page number in the spec your issue relates to. Should the specs be self contained? Owen: I hate trawling glossaries. Should be just one. Simon: I don't think that we should replicate. Dave: there is an overhead in maintenance. What are the properties of informational properties? If you are interacting with a database and the schema changes should that be reflected in the informational property. Would you see it be identified? ... Norman: are the informational properties treated within a transaction management? The answer is yes. Savas: that is orthogonal to DAIS. Simon: are you implying that we need some more transactional properties or are you saying that the transactional properties describes the behaviour as well? Norman: yes. Greg: if the schema changes then you should get the updated information. Norman: and if you are in a separate transaction... in practise you get locked out. Savas: if you don't want the schema to change you use transactions.. Scott: that assumes capabilities of the underlying data resource supports transactions... The final version of JSR114 allows you to perform updates on SQL Rowsets ... is that something that we should support? Greg: suggests a lot of synchronous behaviour that you may have to support Dave: capabilities tend to be vendor specific. It depends on what the underlying capabilities are. Greg: for an SOA you do not expect to be tightly coupled environment which is not desirable. Savas: this is the concept of a disconnected data set ... Greg: but that is a different model ... it carries back all the information to do the update on the data resource ... in the MS model the data set carries the old and new values and how these updates take place ... We have not had a huge number of comments on the realisation spec ... if you feel that if there are any issues in the current spec then raise these as issues on Grid Forge or email members of the team. Brian passes on Amy Krause to talk about the XML spec. Just have two issues: 1086 - XMLCollectionAccess::AddSChema Have had some discussion on this. It would be useful to store the document with the corresponding XML schema but it is not clear how these schema would be stored and whether this is supported by the underlying data resource. Also, documents need to validated against a schema and if you add a schema do you have to validate ... it's costly. Scott: it's inconsistent to have some docs to be validated and not validated by a schema ... you should add a schema at the time of creation. Norman: what happens if you have a document in a calculation ... ... Beth Plale: would you require it? Sephen: no you would not... Scott: if you had no schema in the collection it would not add, ... Ally: can you remove schema? yes. Stephen Langella: that's a more complicated operation? 1085 - Support for XUpdate? The working draft from 14 Sep 2000. Scott: until there is viable alternative then you should leave it? Norman: what happens in other XML data bases? you can remove a document and add it again. --- Dave: we do have all the other issues that have not been discussed logged in Grid Forge so do look at them and add more if you see them. We will be having fortnightly telcons after GGF12 so please join them. ---