OGSA Session 8 - Data Design Team ================================= Notes: Mario Antonioletti. Dave Berry leads the discussion. GGF Intellectual Property Policy is stated. Aim of session is to focus on data transport and go through the straw man data architecture that is being produced by the group. Infod will also give a presentation. PRELUDE: CONVERT TO WORKING GROUP? ---------------------------------- To start off Hiro Kishimoto gives a short presentation on the proposition to convert the OGSA design teams into a working groups. OGSA has various design teams. These work well and have a low governance overhead. At the beginning only intended to do informational documents. Recently trying to get OGSA better publicity in GGF at large. Need more OGSA-* WGs. Design teams do not receive a fitting reward for the work they do. Try to promote the group to a WG. Do not need a BOF. Need to submit "charter and 7Q&As". Try to minimise the delay. The question put to the group is whether the group wants to become a design team? Dave Berry continues: Some benefits: - Better visibility. - Also would have our own mailing list Discussion could be carried on in the mailing list (phone times not convenient for Japan). Malcolm Atkinson: have had complaints that design teams are difficult to join. This change would improve visibility. Telcons exclude people so mailing list would be useful. Would be good to clarify where this group would sit - in the OGSA area and talk to the Data area or vice versa? It appears that it would fit better in the data area but it's a question that should be answered by the GFSG. Malcolm and Hiro would both be happier if the new WG were in the data area. One criticism is to come up with a fixed charter, setting goals and deadlines. At least two potential co-chairs agreed (Dave + a.n.other) Have emailed the people that have been involved yesterday about becoming a WG. Have had 2 positive responses and one negative. WGs would still bind to OGSA through their charters. STRAW MAN DOCUMENT (1) ---------------------- Discussion moves on to data architecture straw man document. Focus on data transfer - an infrastructure for sending data between services or resources. Notion is that at some point you need to transfer data between services. While data is transferred it is not part of a data resource. It is very general part of the architecture as it considers services that are not generally considered to be data services. Things have been treated as very low level. Talk about transfer of data and not transfer protocols. Malcolm: unclear as to what the current level of abstraction is. Can think of various levels, as in low-level ByteIO to high level things where you specify that data must flow from A to B. Ultimately would like to grid programming to be at the highest level of abstraction that allows the system to carry out the necessary optimisation. One of the big issues which makes it hard to discuss is to chose the appropriate level of abstraction. Allen Luniewski: I agree Malcolm: aim high and then consider the lower level things when we have to. Section 2 of the document is a general overview. Items are addressed in more detail later on in the document. The wording of section 2.2 is discussed. Question as to how operations are managed. Simplest model is to allow the producer to manages it. There may be more complex management involved. Allen: not comfortable - need to look at 3 things: - managing the flow of data from the produce to the pipe - manage the flow from the pipe into the sink - then there is the bit in the middle, the pipe, has its own properties: bandwidth, cost, ... Looking it from one perspective seems wrong to me. Malcolm: will modelling the middle component make us fix too many things and not allow engineers to do the optimisation? Allen: the bit in the middle needs to be modelled Malcolm: can I have a lightweight, shared buffer, solution to do this? Do not want people to constrain what can be used. Allen: it may well be that it may be difficult to optimise the bit in the middle if you constrain it too much. The efficiency is the main issue of concern. If a program runs on machine A and a WS on machine B there are bits of infrastructure between A and B that are only noticed when they go wrong. Are you saying that you need a WS to manage the bit in the middle. Allen: yes, it would have to be .... Malcolm: can it be a resource. The sender or receiver has knowledge of the bits in the middle. Allen: but then the producer/consumers will require knowledge about the bits in the middle and that is something I am trying to avoid. Malcolm: I want to avoid that too. At one end you are pushing and at the other end you are pulling. What is the relationship between these and the bit in the middle? Do not want to go through an intermediary. Allen: not suggesting that any buffering is being done in the middle. It is what you manage - you do not use it to put and get bytes. The data transfer is different - if it is not optimised we are lost but I think we need the bit in the middle. Question as to whether the term "information" means the data with its structure, i.e. preserving the structure of the data during a transfer. Have no precise definition for "information". Abdeslem Djaoui: "Information" is probably not the right word to use. It will just cause confusion. INFOD PRESENTATION ------------------ InfoD is asked to present their work, to see how it relates to the straw man document. Steve Fisher goes over a set of slides that describe Infod. Infod is more driven by consumers rather than producers. Have a specification and a use case document. Originally was going to produce an informational document but going to promote to a recommendation. Infod start with the underlying states and state changes. These cause events that can be passed to the consumers. Do not want to flood the system with pieces of information that people are not interested in. Malcolm: if someone has caused data to be delivered then that will not flood the system. Trying to map this to data transfer. Try to distinguish ourselves from WSN. Jem Treadwell: so how are you different from WSN? The distinction is that infod allows consumers to talk about the events and the production of event. Infod allows filtering of these event to take place at all locations while WS-Eventing and WSN just deal with publishing. Infod provides various capabilities to control the subscription, consumption and publishers to be applied flexibly. Malcolm: does this work at any scale? Susan Malaika: consumer can express preferences to do this .... Steve: but would this allow movement of a whole file? Susan: if you are just going to pull it you might just use GridFTP Malcolm: so this is just a control and not a movement. Can this be decoupled into movement and a control part? Steve describes the flow of information: states-> events-> publishers-> messages -> dissemination -> messages -> consumers Malcolm: so Allen's bit is the middle bit between publishers and consumers. Dave: In the DAIS example the query result would be the publisher, the consumer would be the client, the disseminator would then manage the movement between them? A disseminator is not always needed. Can have more than one disseminator (which is termed as propagation). Disseminators will contain information between publishers and subscribers and they know about consumers. Dave: different from the thing in the middle that Allen talked about. Allen: the bit in the middle would be what is between disseminators. A registration manager knows about all the components. It is optional. If you have a disseminator you will require one. Infod requirements are specified in a slide. Dave: these are things that the infod is satisfying and not what is required of a consumer? What the architecture is trying to satisfy. ... Dave: how does infod support the simple use case where you have a database query being read by the third party. Susan: the requestor is the subscriber and the publisher is the.... Allen: but is this over kill? Susan: databases have these things ... they have not been externalised.. Malcolm: but a lot of our data sources are not databases ... Susan: we are trying to generalise the language .... Malcolm: as a publisher can I incrementally construct and as a consumer incrementally consume? Susan: have not got to the level of the interface as to whether you can do chunking. Malcolm: you do not specify chunking and efficiency as requirements Chris Kantarjiev: that is because some of us do not say that we move data Malcolm: so this is just the data control aspect? Susan: that is what we said earlier. ... Dave: so there is the possibility of using infod as the control structure for data movement. ... STRAW MAN DOCUMENT (2) ---------------------- Discussion moves back to discussing the straw man architecture document. The computational steering groups have produced materials that we could use. Need to support the nature of sessions but you may not do everything as a session. Describing sources and sinks - an outline of the metadata to describe the source and sinks is given. These will be resource properties. The metadata specified in the document is gone over. Question about order of data. Malcolm maintains if it is being transferred has an order. Malcolm: an interesting question is whether the structure can be specified in DFDL as opposed to DAIS which uses an ad hoc description such as relational and XML. Malcolm: need to see how data cutter deals with these things. Allen: transport mechanisms supported - is this something that becomes visible at the service interface or is it an implementation choice. Malcolm: most of these are explicit but not always in the OGSA-DAI context. They may want to identify the delivery in a separate context. At the moment you name your destination differently depending on the underlying protocol that you use. Would like a uniform naming scheme that can do this. At the moment the naming scheme is dependent on the protocol. Allen: the negotiation should take place between the end points. Malcolm: you cannot use the naming scheme independently of the protocol. Transformations from what to what - requires a language? Malcolm: would like to rule translation as being out of scope just now but make a note that it might creep in ... you might want to do translations from one byte ordering to another ... the ByteIO is doing the byte transferral ... should be dealing here in terms of information. Do we need a requirement at the sources and sinks to have metadata describing the data that it holds? Susan: Infod does. ... Malcolm: there are a bunch of resource control issues that need to be taken into account ... ... Talk about the encryption/decryption exposed in the metadata. Allen: you definitely want encryption at the source. Malcolm: you may have the source expressing this requirement, the producer or the client that set the transfer in motion. ... Question as to whether the "data produced" is a property of the source or something else. Logging was meant to be bound to the overall system and not the source or sink. Malcolm: we need to be able to state where our security call outs are going to be. Dave: does this map on to the things that infod is specifying? Susan: some of it does. Malcolm: need to get some more work done and then have a meeting with infod. Susan: have an infod face-to-face planned for the 26 & 27th of May. Dave draws the session to a close. Need to flesh out whether we can use infod for our control structure. Malcolm: need to scope out how much material can be in scope for the data architecture. Plan to have the Edinburgh folk do some more work on their data transfer ideas and take them to InfoD for comparison.