========================== GGF Data Area - Data Transport Discussion - Manchester University - 9th December 2003 ====================================================================================== Agenda: ------- Introduction [P.Clarke 5’] Views from Client Services: As seen by DAIS…… [D.Pearson 15’] As seen by GFS……[A.Jagatheesan 15’] As seen by OREP…..[M.Shailendra 15’] ???... Other information and viewpoints Reliable File Transport [B.Allcock 15’] GHPN [F.Travostino 15’] Discussion [ 30’] Summary & Next Steps if any [10’] Local Attendees --------------- Peter Clarke (PC) - Area Director - University College London David Martin (DM) - Area Director - IBM Amy Krause (AM) - EPCC - Part Time Vijay Dialani (VD) - Southampton University - Part Time Mario Antonioletti (MAA) - EPCC - Part Time Dave Pearson (DP) - Oracle Norman Paton (NP) - Manchester University Savas Parastatidis (SP) - Newcastle University Allen Luniewski (AL) - IBM Malcolm Atkinson (MA) - NeSC Simon Laws (SL) - IBM Remote Attendees ---------------- Susan Malaika (SM) - IBM Osamu Tatebe (OT) - AIST Arun Jagatheesan (AJ) - SDSC Franco Travostino (FT) - Nortel Networks Shailendra Mishra (SMi) - Oracle Dieter Gawlick (DG) - Oracle Martin Westhead (MW) - EPCC Ann Chervenak (AC) - ISI Introduction - PC ----------------- Goals 1. Requirements for data transport service specs from client point of view 2. Where should such services be developed New working group? Data Transport - DAIS viewpoint - DP ------------------------------------ Data transport out of scope Dependent on other groups to specify aspects of delivery Deliver data from one source to one or more targets over various channels >From small to very large data sets Suppliers and consumers could be very closer or remote Geography Format Transport may be deferred En route data may be stored Streaming and multi-casting are interesting Monitoring and notification of transport process is interesting Transport protocol independent Systematic encryption and compression over selected channels Data must be same at target as it was at source Possibly composition of access and delivery mechanism QOS Control What is moved Where from/to Units of transfer - logical and physical DM - in IETF this is application level MA - to us its the pipe work Dm - should the transport recognise the structure of the data DP - we may want to say - "get the next record" 8 generic use cases have been documented in the working group which contain delivery/distribution Overall GDD meets many of the requirements that DAIS has identified DM - If transformations used then this violates the "data is same at source and target" requirement MA - When handling large volumes of data important not to move it through multiple stages. Need interface directly into transport mechanism without copying bytes DM - what about network part of transport. Latency Finding closest location DP - Control over performance or cost of a movement is important. MA - Want to be able to predict what we get with a particular transport service PC - EDG has Network Cost Function but not often used DM - Accounting? DP - Would like to track and make information available DM - What granularity, bytes sent, records? DP - Depends on what you are moving PC - Service should have book keeping Data Transport - GDD Viewpoint - SMi ------------------------------------ Want to support any open commodity data transport protocols. Streaming is important Fault tolerant and reliable. Fail over capability Redundancy Connection pooling Low level requirement but important Zero copy Support required at O/S level Leverage existing underlying transport protocols Message oriented middleware Presentation Marshalling engine Don't want to invent a new one Bundled execution Piggy back calls on top of existing calls Do as much as possible in one round trip Data type and type evolution Good if transport handles user defined types. Mostly done at application level though NLS support and character set conversion Belongs in transport Security Open authentication protocols Encryption, Non repudiations QOS Audit and tracking Don't necessarily want to do fine grained monitoring (bits/bytes). To track and debug need it at a higher level DM - what does recovery mean? Recover the connection Oracle has real application cluster. Shared DB architecture. Redundancy through service name. Primary and secondary. If one goes down connection is directed to the other. PC - how does this overlap with GDD? GDD has publication, subscription, propagation, consumption Requirements discussed are all relevant to the propagation service Data Transport - GFS Viewpoint - AJ ----------------------------------- Grid File System Virtual/logical namespace Users consider it to be a single file system and do file IO Actual files may be in file system, database Files have meta data ACL, access times... GFS is a layer over DAIS Open, read, write, close... want read/write ahead striping and streaming of data Caching and replication behind the scenes Bulk operations Ship a collection of data with a single call Third party delivery Latency management Similar requirements to DAIS Additional bulk operations Control over data movement process MS - bulk operation lets you send header indicating much more data to come PC - no explicit presentation from applications usually want simple interface source identifier destination identifier Management elsewhere is the correct approach DM - say more about streaming Skipping through data until we get to where we want to be PC - not obvious that small time critical transport is the same as transport of large data sets Data Transport Service - GHPN Viewpoint - FT -------------------------------------------- Have bottom up view Need top down view Seen gap between business services and network Data Transport Services... Draft expected by GGF10 to chart the ecosystem Requirements for DTS - philosophy Requirements for DTS - basics high level SLAs - one app knows about low level SLAs - physical properties FTP already supports 3rd party transfer Requirements for DTS - help me to help you Requirements for DTS - feedback Requirements for DTS - scaling up Requirements for DTS - hi-touch scheduling atomic data delivery - identify failover points? anycasting - let network figure out the best source destination A new draft By GGF10 PC - requesting network resource for a bypass is a good idea 3.3.1 Data Transport Service overlaps with others E.g. we have a Data Transport research group that should be worrying about some of this We are worrying about the interface and about how the service can be introspected DM - people want network to be aware of structure of data (web proxies) Is DTS just about packets Data awareness in transport is not a good idea Doesn't scale Security - data has to be in the clear at some point Better just to transfer the bits DM - People in GGF are looking at application level data transport Need to agree on level/scope of DTS DM - may want to know that it is structured data that can be stored in an intermediary views need to merged. Data Transport - RFT Viewpoint - Not presented ---------------------------------------------- Understanding is that this overlaps with other areas but we don't know the details DM - real impact is the XIO which is new NP - RFT is a services Request document list of files + source + destination Uses GridFTP to transfer Uses persistent state to record where it has got to Monitoring info also provided Not clear that they have addressed putting this on top of protocols other than GridFTP PC - Not clear if it is tied to GridFTP. AC - It is dependent on GridFTP currently Discussion ---------- PC - Should there be another Data Transport Group Clearly there could be Lot of work to research all points coming from top and bottom DP - Need to partition and define interfaces at the overlap DAIS is at the top and wants to talk to GDD not transport GDD should then talk down to the next level Need model which defined boundaries and interface at those boudaries PC - DAIS - application level Information distribution model tied to a subset of applications? Is GDD specific to DAIS Agnostic transport level Network level DP - Is the scope of GDD the whole spectrum or just replicas FT - New category - Data Marshalling service Data transport service will be anonymous MS - where does connection pooling live FT - transport MS - Connection pools are more effective if they are pushed up the stack Useful from a performance point of view to have info about data resources FT - client aware session layer DM - Are there groups that can't live without a transport services NP - problem is we don't know where to stop If we can't delegate we start to do it DP - GDD is framed at the right level for us PC - draws a picture ---------------------- | DAIS | OREP | GFS | ---------------------- | GDD | RFT | ---------------------- | DTS | ---------------------- | Network | ---------------------- Can't get mind round the nature of the middle two layers MA - Looking for quality of delivery DM - starting to look like content distribution/management networks. Most gone out of business. Is there a critical mass of people who want to solve these problems FT - can we have detailed use cases Food chain from client to service DP - useful to take a use case and map it down through layers FT - secondary goal is to see if we are over engineering PC - But this is a lot of work A group focused on architecture for information distribution would seem like a good idea Might lead to requirements for services at different layers DP - DAIS has a good working relationship with GDD MA - Isn't this in scope of the gap analysis DM - Is any of this in scope of Data Area GGF is a good place to do this IETF/W3C both seem inappropriate PC - So we would like it to be done but who is going to do it How do we make the individuals working on this work together? Inappropriate to release a spec out of one are that misses most of the clients FT - Mike Back? (Logistical Network) may want to contribute OT - What is the relative position of data access interfaces w.r.t the access that the likes of RFT do PC - Data Access interfaces should be talked to by transport but in practice this doesn't work AJ - Connection management/pooling is critical to GFS also DM - no group is currently looking at content transport layer PC - Suggestion DAIS produces an information document on GDD All others with interest should contribute Layer picture would be instructive DM - could be that in future that a GDD group is formed SL - GDD is not just about databases PC - what is the scope of GDD MS - Next FTF Rules aspects Expressions and transformations Transport Interesting interplay as well as with data access Informational paper is a good idea DM - is GDD a general solution or focused on DAIS/Database MS - GDD is based on information sharing Theoretically GDD could be done independently of databases DM - Do people want to make it more general or more specific NP - Inderpal thinking more broadly than databases NP - Need a positioning/related work section in the GDD document Does this replace the requirement for a separate document? DP - if there is someone who wants to write another document we are happy for it go through DAIS MA - Data Area meeting will have web site and call. Could be place to collect know gaps and we may find someone to work on it Simon Laws IBM Hursley Services and Technology