Storage Sensor Sampling¶
Storage Accounting with Sampling/Monitoring¶
There has been much discussion about the sampling of stored data quantities to obtain usage for UR markup.We start with these axioms:
- Data storage is a continuous usage i.e. it has a start time and an end time or duration.
- There are two mechanisms to calculate this being considered:
- record all operations which write to the storage device and delete from storage device
e.g. a gridftp server modified to record I/O inbound and delete operations - measure usage periodically
e.g. for a disk on a linux system using e.g. du
- record all operations which write to the storage device and delete from storage device
- URs do not specify a time period
- Any time period would make the record in itself inaccurate.
- Resources instead publish status information: the total size of all files stored (total allocation was also discussed) to the accounting systems via URs
- UR consumer would then derive an average/max/min value over a period of time for accounting using whatever algorithm they wished
- URs specify a time period
- URs would contain a self consistent usage value
- URs would not reflect the dynamics of the usage
For details see email thread starting here http://www.ogf.org/pipermail/ur-wg/2012-February/000504.html
Agreed (skype meeting 28/02/2012):¶
- Storage UR will specify a start and end time
- To be set by the sensor/Resource-Provider
- and subject only to local policy decisions
- (Resource provider or service software is in best place to determine sampling rates)
- Storage UR will present a data size value: N (bytes or other similar specified units c.f. UR v1)
- i.e. not an integral value
- UR will not mandate how this value is achieved so long as it is a reasonable mechanism and is described publicly for the user to consume
- Storage UR data size value will be interpreted by UR consumers as an average constant value across the time period
Notes¶
- Allowing resource providers to determine their UR start/end time could in principle lead to very many very short period URs in the system.
- It needs to be noted that granularity will determine performance and therefore we need to request of service providers that they cut appropriately coarse grained URs.
- Sampling can still be done as often as necessary for accuracy purposes but the resulting published UR needs to consider publishing average usage over a suitably long time baseline UR.
- Would it be sensible to provide metadata in the UR to indicate the sampling process used?
- Would it be sensible to provide metadata to indicate whether the value comes from a sampling mechanism?