StorageSensorSampling
Version 1 (Jon Kerr Nilsen, 08/08/2012 04:10 AM)
| 1 | 1 | Jon Kerr Nilsen | h1. Storage Sensor Sampling |
|---|---|---|---|
| 2 | 1 | Jon Kerr Nilsen | |
| 3 | 1 | Jon Kerr Nilsen | |
| 4 | 1 | Jon Kerr Nilsen | h2. Storage Accounting with Sampling/Monitoring |
| 5 | 1 | Jon Kerr Nilsen | |
| 6 | 1 | Jon Kerr Nilsen | There has been much discussion about the sampling of stored data quantities to obtain usage for UR markup. |
| 7 | 1 | Jon Kerr Nilsen | We start with these axioms: |
| 8 | 1 | Jon Kerr Nilsen | * Data storage is a continuous usage i.e. it has a start time and an end time or duration. |
| 9 | 1 | Jon Kerr Nilsen | * There are two mechanisms to calculate this being considered: |
| 10 | 1 | Jon Kerr Nilsen | ** record all operations which write to the storage device and delete from storage device |
| 11 | 1 | Jon Kerr Nilsen | e.g. a gridftp server modified to record I/O inbound and delete operations |
| 12 | 1 | Jon Kerr Nilsen | ** measure usage periodically |
| 13 | 1 | Jon Kerr Nilsen | e.g. for a disk on a linux system using e.g. du |
| 14 | 1 | Jon Kerr Nilsen | |
| 15 | 1 | Jon Kerr Nilsen | This lead on to the understanding that there would be inaccuracies in sampling process resulting in two proposals: |
| 16 | 1 | Jon Kerr Nilsen | # URs do not specify a time period |
| 17 | 1 | Jon Kerr Nilsen | ** Any time period would make the record in itself inaccurate. |
| 18 | 1 | Jon Kerr Nilsen | ** Resources instead publish status information: the total size of all files stored (total allocation was also discussed) to the accounting systems via URs |
| 19 | 1 | Jon Kerr Nilsen | ** UR consumer would then derive an average/max/min value over a period of time for accounting using whatever algorithm they wished |
| 20 | 1 | Jon Kerr Nilsen | # URs specify a time period |
| 21 | 1 | Jon Kerr Nilsen | ** URs would contain a self consistent usage value |
| 22 | 1 | Jon Kerr Nilsen | ** URs would not reflect the dynamics of the usage |
| 23 | 1 | Jon Kerr Nilsen | |
| 24 | 1 | Jon Kerr Nilsen | For details see email thread starting here http://www.ogf.org/pipermail/ur-wg/2012-February/000504.html |
| 25 | 1 | Jon Kerr Nilsen | |
| 26 | 1 | Jon Kerr Nilsen | h2. Agreed (skype meeting 28/02/2012): |
| 27 | 1 | Jon Kerr Nilsen | |
| 28 | 1 | Jon Kerr Nilsen | * Storage UR will specify a start and end time |
| 29 | 1 | Jon Kerr Nilsen | ** To be set by the sensor/Resource-Provider |
| 30 | 1 | Jon Kerr Nilsen | ** and subject only to local policy decisions |
| 31 | 1 | Jon Kerr Nilsen | ** (Resource provider or service software is in best place to determine sampling rates) |
| 32 | 1 | Jon Kerr Nilsen | * Storage UR will present a data size value: N (bytes or other similar specified units c.f. UR v1) |
| 33 | 1 | Jon Kerr Nilsen | ** i.e. not an integral value |
| 34 | 1 | Jon Kerr Nilsen | ** UR will not mandate how this value is achieved so long as it is a reasonable mechanism and is described publicly for the user to consume |
| 35 | 1 | Jon Kerr Nilsen | * Storage UR data size value will be interpreted by UR consumers as an average constant value across the time period |
| 36 | 1 | Jon Kerr Nilsen | |
| 37 | 1 | Jon Kerr Nilsen | ---- |
| 38 | 1 | Jon Kerr Nilsen | |
| 39 | 1 | Jon Kerr Nilsen | h2. Notes |
| 40 | 1 | Jon Kerr Nilsen | |
| 41 | 1 | Jon Kerr Nilsen | * Allowing resource providers to determine their UR start/end time could in principle lead to very many very short period URs in the system. |
| 42 | 1 | Jon Kerr Nilsen | ** It needs to be noted that granularity will determine performance and therefore we need to request of service providers that they cut appropriately coarse grained URs. |
| 43 | 1 | Jon Kerr Nilsen | ** Sampling can still be done as often as necessary for accuracy purposes but the resulting published UR needs to consider publishing average usage over a suitably long time baseline UR. |
| 44 | 1 | Jon Kerr Nilsen | * Would it be sensible to provide metadata in the UR to indicate the sampling process used? |
| 45 | 1 | Jon Kerr Nilsen | * Would it be sensible to provide metadata to indicate whether the value comes from a sampling mechanism? |