OGSA Data Services Telcon on 31/03/04 =====================================

Chair: Dave Berry.
Notes by Mario Antonioletti.

Present: Mario Antonioletti, EPCC.
         Dave Berry, NeSC.
         Andrew Grimshaw, University of Virginia.
	   Susan Malaika, IBM.
	   Malcolm Atkinson, NeSC.
	   Allen Luniewski, IBM.

Presentations at:

https://forge.gridforum.org/docman2/ViewCategory.php?group_id=42&category_id=708

0. Actions

All to read existing use cases or propose new ones.

Encourage development of existing straw men or writing of new ones.


1. Early discussion

Andrew suggests that a reminder should be sent about these meetings day before the meeting and possibly a couple of hours before.

Mario volunteers to be the note taker.
Andrew queries what the agenda is going to be.

Dave: Time constrained discussion of the three documents on Grid Forge. Ten minutes
      per document.

Susan wonders whether the Gap Analysis should be discussed in this telcon.
Dave said that it had been discussed at a previous reading, and invited Susan to suggest particular points for future meetings.

Andrew: as an OGSA group we should come up with the components without
        necessarily taking into account what is already out there. We
        want to make sure that we get things right. The gap analysis
        assumes things that are already there.

2. Straw men.

2.1 Dave's straw man

Dave ran through his scenario. 
Process needs to access data when it is running and store results.

There are three patterns discussed (contained in the document).

Malcolm mentions that option 3 can be done by data cutter.

Andrew mentions that program execution is achieved by the data provided with the metadata provided.

Malcolm states that there are data services that might accept programs to run against a data source.

Andrew: this would be a specialised container.

Dave: straw man illustratiates that the architecture should not give
      preference to any of these options over any of the others.

Andrew: seems to be very focused on scientific computing paradigm
        where you have data x and program y and this allows execution
        to happen. The Grid should cover a wider, richer set of
        patterns. Data coming out of a program, being pulled out of a
        relational database, ...

Malcolm: the submission of operations to data is done by higher level
         tools which do not know the final implementation model that
         is going to be applied relative to Dave's three options
         ... have to have an architecture where we are explicit about
         this or we can virtualise over patterns or the more complex
         patterns alluded to by Andrew.

Dave: that is all that I have to say ...


2.2 Allen's straw man

Allen: slide 2 presents the data virtualization provided by a data
       service. Data may come from different sources which are outside
       the Grid architecture. May also come from other data
       services. These interact through the common data service
       APIs. Can have abstraction, federation, transformation ...

Malcolm: what does it exclude? If the data is generated by simulation
         and does not pay attention of the boundary conditons ... the
         foil looks like a description of all computing ... could
         model everything like that.

Allen: the attempt to model any source of data ...

Malcolm: trying to tease what in the picture is particular to data
         ... does not have to be a data service...

Allen: the interface is supposed to be a data interface and not a
       general process interface ... the refinement comes from when
       you look at data ...

Foil 3 is intended to display the interaction between services and the environment. In this case I chose policy ... various QoX aspects. The data service will have an implementation that manages policy. This will allow interaction with other services and the primitive data sources that it is interacting with ... not necessarily in 
the WS-Agreement definition of policy ...

Foil 4 deals with how the data service enforces policy ... in this case considering work load management ...  there may be a more general work load manager ... the work load manager may interact with various data services ... it may also want to interact with the primitive data sources.

Dave: the policy manager on this slide is the same as the policy
      manager in the previous slide?

Allen: that was the intent... the high level policy manager may be
       breaking down into smaller policy chunks ... and ensures that
       the data service is complying with the agreed policy ... deals
       with the negotiation side and the enforcement side ...

Dave: is there anything specific to data service in this? Is policy
       management enforcement not going to apply to every service?

Allen: I think so. The point that it is not different is a good point.

The fifth foil deals with something that has come out of recent meetings ... information dissemination. ... There is nothing real deep in this foil. The question of provisioning came up last week and the 6th foil demonstrates provisioning... the blue bit (the new server) - new facilities may be required as the service is running and these things may be grabbed dynamically.

Malcolm: is there a notion of ... the relationship between data
         resourcees and data services ... can many data services
         represents underlying data services and data resources?

Allen: yes

Malcolm: is there a reasonable model of the overlap and how updates
         propagate?

Allen: a given primitive data resource could have multiple services
       and then the question you brought up comes into play as to how
       updates work..

Malcolm: also there is something about the naming of data services ...
         it might be reasonably to ask the data services what data
         resources are behind you ....

Allen: that seems fair...

Malcolm: good ... seems to run counter to what some proposals seem to
         be stating

Dave: OGSA is planning to come up with a global naming scheme.


2.3 Andrew's straw man

Andrew: posted slide at the beginning of the call... 

We have been doing data grids for a while and have tried to extract the main points ... everything is a service, they all have an EPR and an interface and a type. There will be one or more types ... file types, directory types, streams, file container ...  users may not see all of these ... stored procedures, rdbms ...

Similarly can have derived types that may not be exposed through the interface but through the metadata ... need to understand the type.

It is important that you have multiple levels of abstractions. High level interface that may not be performant ... and low level interfacs that will work much faster.

Propose a three layer architecture:

      - accesss
      - integration
      - provision

Need to put in the middle layer lots of third party integration functions.

A data catalog, we have found, to be intensly useful.

The provisioning layer could take files, databases, ... whatever is going to be presenting a data product. You give it a grid handle and assume access controls ... if you provide a different scheme people may not be keen to let the data go. Need to be able to interact with the native authentication space.

If you provide native interfaces then people will not have a problem with this.

You may want to have layering of caches with consistency ... the 
user though should be able to drill down to the bottom layers. The access layers are syntatic sugar ....

Malcolm: access layers may be there for protection: integrity,
         authentication...

In the commercial sector folks may not delegate the authentication layer ... 

You want to take data from the data product and generate new data products with consistency and caching propagation ...  changing the underlying data may require propagation of these actions to the derived data products.

Should have an architecture with three layers ...

Allen: on foil 3 ... there is a core data service layer ... what
       does that mean?

Andrew: The core is on the right hand side - you may add in other
        types of transformations ... the transforms were xsl
        transforms ...  what I was saying that you should be able
        capture third party transforms...

Allen: where does QoX fit in?

Andrew: Each data service will have metadata associated with it which
        will have some of this information ... if you want high
        availability then you could have replicated servers at the
        back end ... you would not be that interested with the back
        end implementation at this level of the interface.


3. Plans for future work

Dave: so, how do we take this forward for next week?  will you be able
      to do yours for next week?

Malcolm: I hope so ...

Dave: best thing to do is to try and come up with more comments/
      refined cases to what has been considered here ...

Andrew: will not be able to make next week ...

Dave: I will not be here either ... Allen could you lead next week...
      can we speak next week or do the week after ...

*general agreement to skip the following week and do the following week*

Dave: there are a number of use cases being talked about
      explicitly/implicitly - as well as having straw men it might be
      useful to have notes on use cases ... if people could note down
      a set of short use cases that might be useful references to have
      around to later produce more substantive use cases ...

Andrew: we should still look at the use case doc and try to tease out
        data issues ...

Dave: if people can have a look at that document then and tease out
      data...

Susan: which use case document are you talking about

Andrew: there are two use case documents being developed by the OGSA WG. 
        One has already gone out on the window ... and there is another one
        ... (the first is a tier one document and the second is tier
        two which may or may not be available on Grid Forge ... Dave
        said that he would find out).

AOB

Malcolm asks Dave to send out URLs to the documents that have been discussed.

Dave will do this and will try to arrange a meeting for two week's 
time.

DONM

April 14th.  Dave will e-mail details.