History

Use Cases and Requirements¶

Use Cases:¶

I would like to see the number of accesses to a given file/collection during a time window so I determine if...
- A: The file/collection is hot and could be a candidate for replication.
- B: The file/collection is dead and as such is a candidate for deletion (if a replica) or archiving (if on disk and not on Tape).
- All: for the moment the only part that should be accounted for is remote reading (like Amazon is doing) but it is network accounting that is left out of scope for this first implementation of UR
- needs more discussion
I would like to be able to see where files/collections are being accessed from.
- - This would allow me to identify files/collections which would be candidates from replication.
- - This may also aid when I need to justify the existence of my storage resource (If it's heavy public access I can show my data is of wide interest etc)
- All: seems to be more monitoring and might left out scope
If I am a VO and I reserve a block of storage then
- that is viewed by the resource provider as used
- that is viewed as available for the VO to use
  - when a user places data in that storage block then it is used twice
    - once in the context of the user using VO's space
    - once in the context of the VO using the Resource's space
- need to be able to either describe this resource as a virtual resource or indicate the leaseholder of the physical resource
- All: if it is reserved it can not used by others user and should be marked as used but the sensor should be careful and do not double count the resourced already used (used and resources should not overlap and count twice). Reserved space is also subtracted from the free resources.
I would like to view Distributed Storage as a whole and also as distinct parts.
- Consider a system in which storage is distributed across several physical resources(location). Similar to Nordugrid distributed dCahce.
- A set of URs are generated that can be used to account for storage on each distinct resource.
- A single UR for the whole storage resource could be formed from an aggregate or summary record of these distinct resources - or could be formed stand alone as a separate UR.

ALL OK:¶

Requirements:¶

General¶

Each UR must have a unique identity.
- (This is true even if the same information is queried at a different point in time).
- All: yes (done)
Each UR must have a Time-stamp (creation).
- All: yes (done)
The UR should provide a means to identify which system (CREAM-pbs, CREAM-LSF, dCache, StoRM, etc.).
- All: yes
The UR should identify the XML author of the record (sensor and aggregator (if the records are aggregated system and sensor proprieties might be lost)... produced the record)).
- All: yes
The UR should be able to be used in a global and/or a local context.
- All: yes
The UR should allow for the user/project to be identified in a global and/or local context.
- All: yes
The UR should allow for the subject identity to be defined with a level of granularity which reflects that of the user/project which wishes to consume/use the record.
- - This will require a good understanding of the storage system and AAI in place.
- Should be coverable using a profile, nothing stops you from creating a record per file
  - <sr:GroupAttribute sr:attributeType="authority">
  - /O=Grid/OU=example.org/CN=host/auth.example.org
  - </sr:GroupAttribute>
  - With this definition we only need a GlobalGroupAttribute.
- All: yes.
We must define how Aggregate and Summary records can be produced from the URs.
- - These need to provide aggregates/Summaries across storage systems, users, groups...
- All: yes
The UR must allow for the system, on which the resources were consumed, to be identified (e.g.: hostname, URI).
- All: yes
The UR must allow for the software, on which the resources were consumed, to be identified (e.g.: batch system, storage system, etc. name and version).
- All: yes
The UR sensor or summariser should be identified (e.g.: sensor name and version).
- All: yes
The UR producer or summariser should be identified (e.g.: sensor instance URI)
- All: yes

Storage¶

The UR should be able to provide information about storage usage during a given time window (best estimate of the mean over time window).
- - As such a start-time and end-time would be required.
- All: yes. start-time, end-time and creation time cover this.
The UR should be able to provide information about storage groups.
- - Collections of storage resources which may be spread over multiple sites and which are grouped into a logical unit.
- JG: Not necessarily a record thing
- RMP: Can't see any obstacle in the current record for this
- All: yes
The UR should be able to provide information about Allocated resources, Quotas etc.
- All: yes
The UR should be able to provide information on the number of files (number of directories/collections) which correspond to the produced record.
- All: it is out of scope because the UR is aimed to the space occupancy and not on the number of files present. NO.
The UR should allow for file access to be reported.
- - number of times a file (set of files in a directory/collection) was accessed.
- - location of service/user accessing the file (equivalent in some senses to "submitHost" in Compute record).
- All: left out for the moment
The UR should be provide information about both logical and physical storage usage.
- - Logical - Storage volume when just file/object size is considered.
- - Physical - Storage volume when all replicas etc are considered.
- All: yes
The UR should be able to provide information about the type/class of stored data.
- - precious, temporary, replica, pinned etc.
- All: yes
The UR should be able to provide information about the directory path/collection/data set.
- All: yes
The UR should, where ever possible, aim to be compatible with the Compute UR as we aim for a UR 2.0 solution.
- All: yes whenever possible
The UR should be able to provide point in time (snapshot) information about storage usage.
- All: it is able to do it using TimeDuration=0. It might be a misuse because it is monitoring.
The UR should allow us to distinguish between different storage mediums.
- - Disk, Tape, Compound (Disk cache in front of Tape)
- All: already in (down to profile to define details on it

Use Cases:¶

Gather storage usage information with a view to producing accounting/billing records.
- - This should be doable for a resource and/or project and/or user.
- - This would possibly require storage usage information like 4TB weeks.
- - w.r.t accounting/billing this may need a charge field (something equivalent to the charge field in the Compute accounting record)
- All: yes
Gather storage usage information and combine it with compute usage information with a view to producing accounting/billing records.
- - This should be doable for a resource and/or project and/or user.
- - This would possibly require storage usage information like 4TB weeks.
- - w.r.t accounting/billing this may need a charge field (something equivalent to the charge field in the Compute accounting record)
- All: yes
I would like to gather point in time storage usage information at several points in time that would then allow me to predict future usage.
- - This would allow me to plan for new storage purchases etc (or to delete old data)
- All: yes, the usage record might be used for that but it should not a requirement
- AC: UR should be independent and include information over a period of time and not only a point in time
As a project I would like to be able to view the used and unused storage space that I have on a storage resource.
- - Thus I can see how much headroom I have.
- All: yes
As a project I would like to be able to view the requested storage I have on a specific resource and the allocated/reserved resources I have on that resource.
- - Thus I can see I asked for 100TB and I currently have only 80TB at my disposal (of which i am using 50TB).
- All: yes

Standards » Management Area » UR WG

Wiki

Use Cases and Requirements¶

Use Cases:¶

ALL OK:¶

Requirements:¶

General¶

Storage¶

Use Cases:¶