Subsetting and Supersetting - Use Cases for DAIS Context: A Grid-enabled data center contains several large collections of files. Some collections have large files (>50 MB per file); others contain very small files (<1 KB per file). Owing to performance issues, the files may be stored on disk or tape, but are too voluminous to fit within a database. Overall, the data center contains several petabytes of data. Each file collection in the data center has a Logical Name Space for its file, organized as a hierarchical classification schema or tree structure. Each level in the tree has particular kinds of metadata that distinguish the elements at the next level in the tree. In addition, users have a multi-faceted search capability, so that there are other paths users can take to search for a file and for data of interest to them. After the files and their metadata were created, users of the collection ran several data-mining programs that identified particular kinds of features, such as hurricanes, dust storms, and volcanic plumes. Each kind of feature created entries in a "feature catalog" that provides not only the files in which an object appears, but also references (or has pointers to) the data elements the mining program identified as belonging to the object. In addition, there are similar object catalogs in other data centers that also reference particular files and particular data elements in the files in the data center of interest. Search Scenario #1, Subsetting A user at a remote data center chooses a catalog of objects that references objects in files the data center of interest. Based on his interests, he selects two of the objects. The remote data center sends a request to the local one to find the files containing the objects, extract (subset) the objects identified as belonging to these objects, and distribute the results as a file back to the original user using an ftp push to his site. Search Scenario #2, Supersetting (or Aggregating) A user interacts with the data access interface of the archive and identifies 10,000 small files as of interest. Her request creates a file list, a format into which the data in these files is to be inserted, and provides an e-mail address for notification when the aggregated data is ready for ftp pull. Scenario Variants It is also possible to consider variants of the scenarios above, in which the user requests additional services, such as reformatting, transformation, and attachment of metadata and provenance information. In still more complex workflows, the user requesting the data might be an agent of a production process. The agent wants to receive the extracted data, combine it with other data sets to produce new kinds of data, and then transport the data to a third site.