OGSA Data Services Telcon on 31/03/04 ===================================== Chair: Dave Berry. Notes by Mario Antonioletti. Present: Mario Antonioletti, EPCC. Dave Berry, NeSC. Andrew Grimshaw, University of Virginia. Susan Malaika, IBM. Malcolm Atkinson, NeSC. Allen Luniewski, IBM. Presentations at: https://forge.gridforum.org/docman2/ViewCategory.php?group_id=42&category_id=708 0. Actions All to read existing use cases or propose new ones. Encourage development of existing straw men or writing of new ones. 1. Early discussion Andrew suggests that a reminder should be sent about these meetings day before the meeting and possibly a couple of hours before. Mario volunteers to be the note taker. Andrew queries what the agenda is going to be. Dave: Time constrained discussion of the three documents on Grid Forge. Ten minutes per document. Susan wonders whether the Gap Analysis should be discussed in this telcon. Dave said that it had been discussed at a previous reading, and invited Susan to suggest particular points for future meetings. Andrew: as an OGSA group we should come up with the components without necessarily taking into account what is already out there. We want to make sure that we get things right. The gap analysis assumes things that are already there. 2. Straw men. 2.1 Dave's straw man Dave ran through his scenario. Process needs to access data when it is running and store results. There are three patterns discussed (contained in the document). Malcolm mentions that option 3 can be done by data cutter. Andrew mentions that program execution is achieved by the data provided with the metadata provided. Malcolm states that there are data services that might accept programs to run against a data source. Andrew: this would be a specialised container. Dave: straw man illustratiates that the architecture should not give preference to any of these options over any of the others. Andrew: seems to be very focused on scientific computing paradigm where you have data x and program y and this allows execution to happen. The Grid should cover a wider, richer set of patterns. Data coming out of a program, being pulled out of a relational database, ... Malcolm: the submission of operations to data is done by higher level tools which do not know the final implementation model that is going to be applied relative to Dave's three options ... have to have an architecture where we are explicit about this or we can virtualise over patterns or the more complex patterns alluded to by Andrew. Dave: that is all that I have to say ... 2.2 Allen's straw man Allen: slide 2 presents the data virtualization provided by a data service. Data may come from different sources which are outside the Grid architecture. May also come from other data services. These interact through the common data service APIs. Can have abstraction, federation, transformation ... Malcolm: what does it exclude? If the data is generated by simulation and does not pay attention of the boundary conditons ... the foil looks like a description of all computing ... could model everything like that. Allen: the attempt to model any source of data ... Malcolm: trying to tease what in the picture is particular to data ... does not have to be a data service... Allen: the interface is supposed to be a data interface and not a general process interface ... the refinement comes from when you look at data ... Foil 3 is intended to display the interaction between services and the environment. In this case I chose policy ... various QoX aspects. The data service will have an implementation that manages policy. This will allow interaction with other services and the primitive data sources that it is interacting with ... not necessarily in the WS-Agreement definition of policy ... Foil 4 deals with how the data service enforces policy ... in this case considering work load management ... there may be a more general work load manager ... the work load manager may interact with various data services ... it may also want to interact with the primitive data sources. Dave: the policy manager on this slide is the same as the policy manager in the previous slide? Allen: that was the intent... the high level policy manager may be breaking down into smaller policy chunks ... and ensures that the data service is complying with the agreed policy ... deals with the negotiation side and the enforcement side ... Dave: is there anything specific to data service in this? Is policy management enforcement not going to apply to every service? Allen: I think so. The point that it is not different is a good point. The fifth foil deals with something that has come out of recent meetings ... information dissemination. ... There is nothing real deep in this foil. The question of provisioning came up last week and the 6th foil demonstrates provisioning... the blue bit (the new server) - new facilities may be required as the service is running and these things may be grabbed dynamically. Malcolm: is there a notion of ... the relationship between data resourcees and data services ... can many data services represents underlying data services and data resources? Allen: yes Malcolm: is there a reasonable model of the overlap and how updates propagate? Allen: a given primitive data resource could have multiple services and then the question you brought up comes into play as to how updates work.. Malcolm: also there is something about the naming of data services ... it might be reasonably to ask the data services what data resources are behind you .... Allen: that seems fair... Malcolm: good ... seems to run counter to what some proposals seem to be stating Dave: OGSA is planning to come up with a global naming scheme. 2.3 Andrew's straw man Andrew: posted slide at the beginning of the call... We have been doing data grids for a while and have tried to extract the main points ... everything is a service, they all have an EPR and an interface and a type. There will be one or more types ... file types, directory types, streams, file container ... users may not see all of these ... stored procedures, rdbms ... Similarly can have derived types that may not be exposed through the interface but through the metadata ... need to understand the type. It is important that you have multiple levels of abstractions. High level interface that may not be performant ... and low level interfacs that will work much faster. Propose a three layer architecture: - accesss - integration - provision Need to put in the middle layer lots of third party integration functions. A data catalog, we have found, to be intensly useful. The provisioning layer could take files, databases, ... whatever is going to be presenting a data product. You give it a grid handle and assume access controls ... if you provide a different scheme people may not be keen to let the data go. Need to be able to interact with the native authentication space. If you provide native interfaces then people will not have a problem with this. You may want to have layering of caches with consistency ... the user though should be able to drill down to the bottom layers. The access layers are syntatic sugar .... Malcolm: access layers may be there for protection: integrity, authentication... In the commercial sector folks may not delegate the authentication layer ... You want to take data from the data product and generate new data products with consistency and caching propagation ... changing the underlying data may require propagation of these actions to the derived data products. Should have an architecture with three layers ... Allen: on foil 3 ... there is a core data service layer ... what does that mean? Andrew: The core is on the right hand side - you may add in other types of transformations ... the transforms were xsl transforms ... what I was saying that you should be able capture third party transforms... Allen: where does QoX fit in? Andrew: Each data service will have metadata associated with it which will have some of this information ... if you want high availability then you could have replicated servers at the back end ... you would not be that interested with the back end implementation at this level of the interface. 3. Plans for future work Dave: so, how do we take this forward for next week? will you be able to do yours for next week? Malcolm: I hope so ... Dave: best thing to do is to try and come up with more comments/ refined cases to what has been considered here ... Andrew: will not be able to make next week ... Dave: I will not be here either ... Allen could you lead next week... can we speak next week or do the week after ... *general agreement to skip the following week and do the following week* Dave: there are a number of use cases being talked about explicitly/implicitly - as well as having straw men it might be useful to have notes on use cases ... if people could note down a set of short use cases that might be useful references to have around to later produce more substantive use cases ... Andrew: we should still look at the use case doc and try to tease out data issues ... Dave: if people can have a look at that document then and tease out data... Susan: which use case document are you talking about Andrew: there are two use case documents being developed by the OGSA WG. One has already gone out on the window ... and there is another one ... (the first is a tier one document and the second is tier two which may or may not be available on Grid Forge ... Dave said that he would find out). AOB Malcolm asks Dave to send out URLs to the documents that have been discussed. Dave will do this and will try to arrange a meeting for two week's time. DONM April 14th. Dave will e-mail details.