This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /projects/ur-wg/wiki/StorageSensorSampling at Thu, 03 Nov 2022 15:25:27 GMT StorageSensorSampling - UR WG - Open Grid Forum

Storage Sensor Sampling

Storage Accounting with Sampling/Monitoring

There has been much discussion about the sampling of stored data quantities to obtain usage for UR markup.
We start with these axioms:
  • Data storage is a continuous usage i.e. it has a start time and an end time or duration.
  • There are two mechanisms to calculate this being considered:
    • record all operations which write to the storage device and delete from storage device
      e.g. a gridftp server modified to record I/O inbound and delete operations
    • measure usage periodically
      e.g. for a disk on a linux system using e.g. du
This lead on to the understanding that there would be inaccuracies in sampling process resulting in two proposals:
  1. URs do not specify a time period
    • Any time period would make the record in itself inaccurate.
    • Resources instead publish status information: the total size of all files stored (total allocation was also discussed) to the accounting systems via URs
    • UR consumer would then derive an average/max/min value over a period of time for accounting using whatever algorithm they wished
  2. URs specify a time period
    • URs would contain a self consistent usage value
    • URs would not reflect the dynamics of the usage

For details see email thread starting here http://www.ogf.org/pipermail/ur-wg/2012-February/000504.html

Agreed (skype meeting 28/02/2012):

  • Storage UR will specify a start and end time
    • To be set by the sensor/Resource-Provider
    • and subject only to local policy decisions
    • (Resource provider or service software is in best place to determine sampling rates)
  • Storage UR will present a data size value: N (bytes or other similar specified units c.f. UR v1)
    • i.e. not an integral value
    • UR will not mandate how this value is achieved so long as it is a reasonable mechanism and is described publicly for the user to consume
  • Storage UR data size value will be interpreted by UR consumers as an average constant value across the time period

Notes

  • Allowing resource providers to determine their UR start/end time could in principle lead to very many very short period URs in the system.
    • It needs to be noted that granularity will determine performance and therefore we need to request of service providers that they cut appropriately coarse grained URs.
    • Sampling can still be done as often as necessary for accuracy purposes but the resulting published UR needs to consider publishing average usage over a suitably long time baseline UR.
  • Would it be sensible to provide metadata in the UR to indicate the sampling process used?
  • Would it be sensible to provide metadata to indicate whether the value comes from a sampling mechanism?
This is a static archive of the previous Open Grid Forum Redmine content management system saved from host redmine.ogf.org file /projects/ur-wg/wiki/StorageSensorSampling at Thu, 03 Nov 2022 15:25:29 GMT