OGSA Data Session Notes
1 February 2007, 09:00-11:30 EST
Allen Luniewski Leading

Allen presented a status report.  Progress was slow in the autumn but picked up over the winter.  Most of the work on the architecture document has been on how to add interface descriptions, following discussion at the August F2F.

The architecture document is substantially complete, although work is still continuing on details.  The scenarios document needs more work, with some scenarios still to be written.  Both documents are behind schedule.

The architecture document now has an appendix that maps sections of the architecture to actual specifications, including WS-DAI, ByteIO, SRM and DMI.

The DMI WG are progressing the specification of Data Transfer, fitting with the OGSA Data Architecture.

Arie Shoshani raised a question about the notion of primary replica.  The primary is in some sense the original data set (a.k.a. "the truth").  This idea is common within the business and database community; it also applies to particle physics data from CERN.  We distinguished the two concepts of ownership and authorisation; the first is who owns the data and the second who is allowed to access it.

Dave Snelling pointed out that CreateReplica can be targeted at an existing replica.  In this context, the existing replica becomes the primary for the new one.  This might be different from existing systems where all replicas must be known to the primary.  We need to flag this in the document.

We should add some sentences to clarify the distinction between a replica and a copy. 

We also need to note the role of data transfer in replication may require more than single-shot data transfers, e.g. setting up a session or stream between the primary and replicas. 

The GFS WG may be producing a spec for file metadata.

A question was asked about the relationship between the OGSA Data architecture and the OGF's technical strategy document.  It seems that the technical strategy document was produced without relation to our work.


For the last half hour of the meeting, we discussed the role of storage in the architecture.  Allen noted that the current section has concepts of storage, files, directories, file spaces & file systems, and raised the question of whether this is appropriate for the architecture given that we already have RNS for naming and directories etc.

Arie explained the history of SRM, which began with just the management of storage, but grew in the application domain to include managing the lifetime of things stored on the system, then describing files, and then directories for organising them.  SRM is not going to consider replica catalogs and logical file names.

Allen asked whether directories should be managed by the storage system.  Arie said that SRM could accept any approach that supports directories and access control lists.  Dave Berry pointed out that any system could provide an RNS interface for managing directories. Osamu explained that RNS should not be thought of as being implemented entirely outside native systems (e.g. storage) for efficiency reasons.

There was some detailed discussion.  This clarified the point for Allen, which is that users may see one directory name on the storage and another at the Grid level.  Arie suggested that this only applies when replication happens, as this is when the Grid level directory comes into existence.  Arie further explained that the system providing the storage must have the ability to decide how to arrange the storage separately from the Grid-level hierarchic namespace.