OGSA-Data WG OGF20 - 7th May 2007 ================================= Chair: Dave Berry (DB) Note taker: Mario Antonioletti (MAA) Regan Moore (RM) asks whether the data that is being is access is managed by the service or whether it is managed by something else. If it is managed by something else then the service needs to know whether the data it provides access to is being modified under it - how do you detect that and maintain consistency? Andrew Grimshaw (AG) gives a set of counter examples. RM: who controls the data? DB: This depends on the implementation but in theory the service could control the data or it might be done by the underlying object. RM: when building a data management environment you make assertions as to how the data is being managed - who maintains those the assertions? DB: The application has the responsibility for maintaining these assertions. Mark Morgan (MM): if you write a program that writes to an output stream then you care what that output interface has agreed to do that for you. If you are writing to a disk then other dependencies come in and you care about those other things too. AG: the basic interface say that this is what this service does and everything else is out of scope. You can combine other interfaces to do this. ... MM: ByteIO is for accessing files ... encourage phrasing such as ByteIO is for accessing files in a file like way while DAIS is for accessing data in a database like way... ... RM: who manages the name spaces across the services? Which data service do you have to go to? DB: Implementation level RM: can you implement a federation with one name spaces? AG: In WS-Naming there are human level names where there could be collisions and another layer which uses EPRs are used where it is very unlikely that you will have collisions. David Martin (DM): the document is kind of silent on this ... DB: yes - there is very little that is truly generic that can be said about the data federation interface. ... DB: have to look at the scenarios - need to examine the data federation scenario to see whether Regan's question is addressed. ... AG: storage is distinct from RNS? DB: yes ... RM: are you suggesting any collective operations on to storage like load leveller? ... If I have 5 storage systems and I want to distribute files across all of them ... you would have to have to specify it in policy - you don't need to reserve space but can check if there is aggregate space available. ... MM: have been talking about getting policies to services - creation of a resource that you want replicated happens indirectly because you've done something else - trying to see how you carry information across a call session ... DM: want to plug in to a generic policy system .... MM: there is other information that you need ... you may need to know in what RNS space you operate - there is an implicit context on which you operate ... AG: if you make these things part of the arguments then you may need to know different things as well ... there are also issues with talking to a policy managers ... in CORBA you had contacts that were included in the messages you could get in touch with ... MM: in genesis 2 pass this type of information in SOAP headers RM: interested in management of policies ... can apply policy at the services but also over the data you manage. We had to implement a name space for policies and a name space for services and another name space for outcomes of the policies ... had to manage the service, the operations and the outcome of the policy .... ... DM: can you use WSRF to manage a session framework? Have a persistent services that acts as a session ... DB: Could do ... doable ... ... Moving on to Scenarios document AG: have built a grid file system using RNS and ByteIO ... AG: have implemented a replication service - the registry is a ws-name resolver ... can take an EPR which is unbounded the resolver will give you a replicator to interact with ... have the same idea as in the scenario - have different coherency protocols you may want to use - could use read only so you then have no coherency problem ... the replica that gets the write begins to propagate straight away ... ws-naming can be used pretty transparently ... if one fails the client can rebind ... DM: how does one know it fails? MM: there are some conditions that a client can specify - a security exception could lead to a rebind - it's up to client to decide that some condition may exist that makes them rebind - e.g. connection refused ... DM: may need to deal with exceptions - if the replica fails you'll need to rebind .... AG: tried to capture all of the logic of when something goes wrong inside the resolver .... RM: someone has to notify the replications service that it was done AG: can deduce this if it's not talking ... MM: want to respond to RM's point - replication service may need to know of the failure - in ws-naming when the client goes to the resolver the client can say "I've failed to communicate with the target and this is how I address the target" as the resolver may not know which target has failed - the resolver may have some management capabilities or there may be nothing wrong with that replica - the problem may be at the client. The resolver can still re-direct. The registry can poll or be notified by the client ... RM: where does this type of functionality lie in the architecture? ... DB: have no monitoring in this scenario .. DM: is there a part of ogsa that deals with notifications. AG: fault tolerance and notification ... there is a management section in OGSA but it needs to be fleshed. RM: same problem when you put a file out - how do you know if it works? AG: there is a fault model .... DM: or you do it at the transfer service ... AG: the service will then get the fault ... Chris Smith (CS): but most of the stuff is done asynchronously - you can register for a notification (of failure) and/or you can poll the service .... ... Arun Jaqatheesan (AJ): do we need a replication for replication? DB: Have some scenarios on take 2 approaches to replication .... there are more possibilities ... maybe it could be an informational document that fleshes out replication AJ: in the grid file system - the client can specify - the service should not be forced .... DB: should do a slide that has an information service talking to a replication management system directly - this scenario only shows that it can be done using OGSA. ... AJ: the GFS scenario? Chris Jordan was going send us a scenario but it's not happened as yet. ??: for file access is there a concept of data sets? a collection of files that had some related ... AG: could use an RNS directory to do that - if all the data was flat file data, create a directory, these would have EPRs and have an RNS entries for those EPRs... ??: For replicas you'd need to validate that these have the same context ... does the architecture allow for this and allow the stuff to be put back in coherency ... DB: The data entries we deal with can be pretty much anything ... the files should really say files or collection of files - the essential thing is that it can be named .., AG: can do this in RNS in a multiple of ways: can have RNS names point to EPR which have WS-Names which can then go into a replica manage service; if you take a subset of a tree and use this ... ???: if you move replica from the user space to the service space ... does this support versioning of files? DB: not explicitly - you would have to to do that yourself. If you have a file system that can handle versioning you could plug that in. ....