UR WG - Open Grid Forum

1

Jon Kerr Nilsen

h1. Storage Sensor Sampling

2

1

Jon Kerr Nilsen

3

1

Jon Kerr Nilsen

4

1

Jon Kerr Nilsen

h2. Storage Accounting with Sampling/Monitoring

5

1

Jon Kerr Nilsen

6

1

Jon Kerr Nilsen

There has been much discussion about the sampling of stored data quantities to obtain usage for UR markup.

7

1

Jon Kerr Nilsen

We start with these axioms:

8

1

Jon Kerr Nilsen

* Data storage is a continuous usage i.e. it has a start time and an end time or duration.

9

1

Jon Kerr Nilsen

* There are two mechanisms to calculate this being considered:

10

1

Jon Kerr Nilsen

** record all operations which write to the storage device and delete from storage device

11

1

Jon Kerr Nilsen

     e.g. a gridftp server modified to record I/O inbound and delete operations

12

1

Jon Kerr Nilsen

** measure usage periodically

13

1

Jon Kerr Nilsen

     e.g. for a disk on a linux system using e.g. du

14

1

Jon Kerr Nilsen

15

1

Jon Kerr Nilsen

This lead on to the understanding that there would be inaccuracies in sampling process resulting in two proposals:

16

1

Jon Kerr Nilsen

# URs do not specify a time period

17

1

Jon Kerr Nilsen

** Any time period would make the record in itself inaccurate.

18

1

Jon Kerr Nilsen

** Resources instead publish status information: the total size of all files stored (total allocation was also discussed) to the accounting systems via URs

19

1

Jon Kerr Nilsen

** UR consumer would then derive an average/max/min value over a period of time for accounting using whatever algorithm they wished

20

1

Jon Kerr Nilsen

# URs specify a time period

21

1

Jon Kerr Nilsen

** URs would contain a self consistent usage value

22

1

Jon Kerr Nilsen

** URs would not reflect the dynamics of the usage

23

1

Jon Kerr Nilsen

24

1

Jon Kerr Nilsen

For details see email thread starting here http://www.ogf.org/pipermail/ur-wg/2012-February/000504.html

25

1

Jon Kerr Nilsen

26

1

Jon Kerr Nilsen

h2. Agreed (skype meeting 28/02/2012):

27

1

Jon Kerr Nilsen

28

1

Jon Kerr Nilsen

* Storage UR will specify a start and end time

29

1

Jon Kerr Nilsen

** To be set by the sensor/Resource-Provider

30

1

Jon Kerr Nilsen

** and subject only to local policy decisions

31

1

Jon Kerr Nilsen

** (Resource provider or service software is in best place to determine sampling rates)

32

1

Jon Kerr Nilsen

* Storage UR will present a data size value: N (bytes or other similar specified units c.f. UR v1)

33

1

Jon Kerr Nilsen

** i.e. not an integral value

34

1

Jon Kerr Nilsen

** UR will not mandate how this value is achieved so long as it is a reasonable mechanism and is described publicly for the user to consume

35

1

Jon Kerr Nilsen

* Storage UR data size value will be interpreted  by UR consumers as an average constant value across the time period

36

1

Jon Kerr Nilsen

37

1

Jon Kerr Nilsen

----

38

1

Jon Kerr Nilsen

39

1

Jon Kerr Nilsen

h2. Notes

40

1

Jon Kerr Nilsen

41

1

Jon Kerr Nilsen

* Allowing resource providers to determine their UR start/end time could in principle lead to very many very short period URs in the system.

42

1

Jon Kerr Nilsen

** It needs to be noted that granularity will determine performance and therefore we need to request of service providers that they cut appropriately coarse grained URs.

43

1

Jon Kerr Nilsen

** Sampling can still be done as often as necessary for accuracy purposes but the resulting published UR needs to consider publishing average usage over a suitably long time baseline UR.

44

1

Jon Kerr Nilsen

* Would it be sensible to provide metadata in the UR to indicate the sampling process used?

45

1

Jon Kerr Nilsen

* Would it be sensible to provide metadata to indicate whether the value comes from a sampling mechanism?

Standards » Management Area » UR WG

StorageSensorSampling

-Jon Kerr Nilsen
+h1. Storage Sensor Sampling
 Jon Kerr Nilsen
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+h2. Storage Accounting with Sampling/Monitoring
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+There has been much discussion about the sampling of stored data quantities to obtain usage for UR markup.
-Jon Kerr Nilsen
+We start with these axioms:
-Jon Kerr Nilsen
+* Data storage is a continuous usage i.e. it has a start time and an end time or duration.
-Jon Kerr Nilsen
+* There are two mechanisms to calculate this being considered:
-Jon Kerr Nilsen
+** record all operations which write to the storage device and delete from storage device
-Jon Kerr Nilsen
+     e.g. a gridftp server modified to record I/O inbound and delete operations
-Jon Kerr Nilsen
+** measure usage periodically
-Jon Kerr Nilsen
+     e.g. for a disk on a linux system using e.g. du
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+This lead on to the understanding that there would be inaccuracies in sampling process resulting in two proposals:
-Jon Kerr Nilsen
+# URs do not specify a time period
-Jon Kerr Nilsen
+** Any time period would make the record in itself inaccurate.
-Jon Kerr Nilsen
+** Resources instead publish status information: the total size of all files stored (total allocation was also discussed) to the accounting systems via URs
-Jon Kerr Nilsen
+** UR consumer would then derive an average/max/min value over a period of time for accounting using whatever algorithm they wished
-Jon Kerr Nilsen
+# URs specify a time period
-Jon Kerr Nilsen
+** URs would contain a self consistent usage value
-Jon Kerr Nilsen
+** URs would not reflect the dynamics of the usage
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+For details see email thread starting here http://www.ogf.org/pipermail/ur-wg/2012-February/000504.html
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+h2. Agreed (skype meeting 28/02/2012):
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+* Storage UR will specify a start and end time
-Jon Kerr Nilsen
+** To be set by the sensor/Resource-Provider
-Jon Kerr Nilsen
+** and subject only to local policy decisions
-Jon Kerr Nilsen
+** (Resource provider or service software is in best place to determine sampling rates)
-Jon Kerr Nilsen
+* Storage UR will present a data size value: N (bytes or other similar specified units c.f. UR v1)
-Jon Kerr Nilsen
+** i.e. not an integral value
-Jon Kerr Nilsen
+** UR will not mandate how this value is achieved so long as it is a reasonable mechanism and is described publicly for the user to consume
-Jon Kerr Nilsen
+* Storage UR data size value will be interpreted  by UR consumers as an average constant value across the time period
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+----
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+h2. Notes
 Jon Kerr Nilsen
-Jon Kerr Nilsen
+* Allowing resource providers to determine their UR start/end time could in principle lead to very many very short period URs in the system.
-Jon Kerr Nilsen
+** It needs to be noted that granularity will determine performance and therefore we need to request of service providers that they cut appropriately coarse grained URs.
-Jon Kerr Nilsen
+** Sampling can still be done as often as necessary for accuracy purposes but the resulting published UR needs to consider publishing average usage over a suitably long time baseline UR.
-Jon Kerr Nilsen
+* Would it be sensible to provide metadata in the UR to indicate the sampling process used?
-Jon Kerr Nilsen
+* Would it be sensible to provide metadata to indicate whether the value comes from a sampling mechanism?