StorageSensorSampling
Version 1 (Jon Kerr Nilsen, 08/08/2012 04:10 AM)
1 | 1 | Jon Kerr Nilsen | h1. Storage Sensor Sampling |
---|---|---|---|
2 | 1 | Jon Kerr Nilsen | |
3 | 1 | Jon Kerr Nilsen | |
4 | 1 | Jon Kerr Nilsen | h2. Storage Accounting with Sampling/Monitoring |
5 | 1 | Jon Kerr Nilsen | |
6 | 1 | Jon Kerr Nilsen | There has been much discussion about the sampling of stored data quantities to obtain usage for UR markup. |
7 | 1 | Jon Kerr Nilsen | We start with these axioms: |
8 | 1 | Jon Kerr Nilsen | * Data storage is a continuous usage i.e. it has a start time and an end time or duration. |
9 | 1 | Jon Kerr Nilsen | * There are two mechanisms to calculate this being considered: |
10 | 1 | Jon Kerr Nilsen | ** record all operations which write to the storage device and delete from storage device |
11 | 1 | Jon Kerr Nilsen | e.g. a gridftp server modified to record I/O inbound and delete operations |
12 | 1 | Jon Kerr Nilsen | ** measure usage periodically |
13 | 1 | Jon Kerr Nilsen | e.g. for a disk on a linux system using e.g. du |
14 | 1 | Jon Kerr Nilsen | |
15 | 1 | Jon Kerr Nilsen | This lead on to the understanding that there would be inaccuracies in sampling process resulting in two proposals: |
16 | 1 | Jon Kerr Nilsen | # URs do not specify a time period |
17 | 1 | Jon Kerr Nilsen | ** Any time period would make the record in itself inaccurate. |
18 | 1 | Jon Kerr Nilsen | ** Resources instead publish status information: the total size of all files stored (total allocation was also discussed) to the accounting systems via URs |
19 | 1 | Jon Kerr Nilsen | ** UR consumer would then derive an average/max/min value over a period of time for accounting using whatever algorithm they wished |
20 | 1 | Jon Kerr Nilsen | # URs specify a time period |
21 | 1 | Jon Kerr Nilsen | ** URs would contain a self consistent usage value |
22 | 1 | Jon Kerr Nilsen | ** URs would not reflect the dynamics of the usage |
23 | 1 | Jon Kerr Nilsen | |
24 | 1 | Jon Kerr Nilsen | For details see email thread starting here http://www.ogf.org/pipermail/ur-wg/2012-February/000504.html |
25 | 1 | Jon Kerr Nilsen | |
26 | 1 | Jon Kerr Nilsen | h2. Agreed (skype meeting 28/02/2012): |
27 | 1 | Jon Kerr Nilsen | |
28 | 1 | Jon Kerr Nilsen | * Storage UR will specify a start and end time |
29 | 1 | Jon Kerr Nilsen | ** To be set by the sensor/Resource-Provider |
30 | 1 | Jon Kerr Nilsen | ** and subject only to local policy decisions |
31 | 1 | Jon Kerr Nilsen | ** (Resource provider or service software is in best place to determine sampling rates) |
32 | 1 | Jon Kerr Nilsen | * Storage UR will present a data size value: N (bytes or other similar specified units c.f. UR v1) |
33 | 1 | Jon Kerr Nilsen | ** i.e. not an integral value |
34 | 1 | Jon Kerr Nilsen | ** UR will not mandate how this value is achieved so long as it is a reasonable mechanism and is described publicly for the user to consume |
35 | 1 | Jon Kerr Nilsen | * Storage UR data size value will be interpreted by UR consumers as an average constant value across the time period |
36 | 1 | Jon Kerr Nilsen | |
37 | 1 | Jon Kerr Nilsen | ---- |
38 | 1 | Jon Kerr Nilsen | |
39 | 1 | Jon Kerr Nilsen | h2. Notes |
40 | 1 | Jon Kerr Nilsen | |
41 | 1 | Jon Kerr Nilsen | * Allowing resource providers to determine their UR start/end time could in principle lead to very many very short period URs in the system. |
42 | 1 | Jon Kerr Nilsen | ** It needs to be noted that granularity will determine performance and therefore we need to request of service providers that they cut appropriately coarse grained URs. |
43 | 1 | Jon Kerr Nilsen | ** Sampling can still be done as often as necessary for accuracy purposes but the resulting published UR needs to consider publishing average usage over a suitably long time baseline UR. |
44 | 1 | Jon Kerr Nilsen | * Would it be sensible to provide metadata in the UR to indicate the sampling process used? |
45 | 1 | Jon Kerr Nilsen | * Would it be sensible to provide metadata to indicate whether the value comes from a sampling mechanism? |