The following describes what demands the IRAC-IST (SSC staff) and IT (SAO staff) is likely to make on the SIRTF archive system during the IOC period. I describe the layout of the computers, typical data volumes, and the expected data flow needed.
Configuration
First, we assume the existence of several groups of computers:
Expected Use by IRAC-IT
The IRAC-IT expects almost all the data processing critical for IOC to be carried out by them either on their own linux machines or to be shipped back to SAO for further analysis by their staff there using their own data pipeline. As such they are likely to want all the raw IRAC data delivered to them. This is an expected raw data volume of roughly 2-4 GB/day. This would ideally be accomplished by a method which pushed the raw data onto their linux workstation after it has arrived from MIPL and been deposited into the SSC archive. A delivery latency of perhaps a few (maximum of 8) hours could be tolerated. They will then set up an automated method to push this data back to SAO. The data delivery method described by Lee Bennett involving push of data after it has been registered within the SSC archive should be well-suited to this task.
The focus analysis task requires extremely fast turnaround. This would be accomplished by having the IRAC-IT linux machine having a direct subscription to MIPL and receiving the focus data directly from them. They will then perform their pipeline processing and transfer the data into the focus analysis PC. The total data volume to be delivered by the SSC to the IT is very small.
Expected Use by IRAC-IST
The 6.3 delivery of the pipeline is expected to be too immature for adequate processing of the data. Debugging of the pipeline will occur on the 6.3+ development network, which is where the "offline" versions of the pipeline can be run. Additionally, for many tasks we anticipate hand-reduction of the data during IOC. This will be carried out on the science network workstations (6 above).
The IRAC-IST expect to be primarily accessing the sandbox and archive in order to request datasets for specific AORs and IERs, and in most cases this will be after the 6.3 pipeline has processed them (since it adds important information such as pointing, without which further reduction is difficult). It is unlikely the IST will request all of the data. However, the pipeline expansion factors are very large. At a minimum, this is a factor of 10, and for a full delivery of all intermediate pipeline products for IRAC is more like 50. In principle a full delivery could demand 50-200 GB/day of data transfer.
The IRAC-IST will need a tool to identify and transfer specified datasets both to the science network and also to the 6.3+ development network, as needed. Presumably this would be a tool that runs on the IST OPS computer, and which would allow one to transfer data through the firewall via a proxy. Because the archive stores data based on request key (which is assigned by uplink), the IST cannot trivially identify datasets within the archive directory structure. Therefore, the tool needs to be able to make database queries to find the desired data. The tool needs to be able to identify data in several ways:
Ideally one should be able to enter search parameters, be returned a list of all matching datasets, and then select which datasets should be delivered through the firewall. This is fundamentally very similar to what the eventual archive system for world data delivery will do. Additionally, because of the severe human-accessibility problems encountered with the archive directory structure, the directory tree should be flattened prior to delivery using the script already written by Russ.