The performance of computational geophysical data processing and forward modelling relies on both computational and data. Significant efforts on developing new data formats and libraries have been made the community, such as IRIS/PASSCAL and ASDF in data, and programs and utilities such as ObsPy and SPECFEM. The National Computational Infrastructure hosts a national significant geophysical data collection that is co-located with a high performance computing facility and provides an opportunity to investigate how to improve the data formats from both a data management and a performance point of view. This paper investigates how to enhance the data usability in several perspectives: 1) propose a convention for the seismic (both active and passive) community to improve the data accessibility and interoperability; 2) recommend the convention used in the HDF container when data is made available in PH5 or ASDF formats; 3) provide tools to convert between various seismic data formats; 4) provide performance benchmark cases using ObsPy library and SPECFEM3D to demonstrate how different data organization in terms of chunking size and compression impact on the performance by comparing new data formats, such as PH5 and ASDF to traditional formats such as SEGY, SEED, SAC, etc. In this work we apply our knowledge and experience on data standards and conventions, such as CF and ACDD from the climate community to the seismology community. The generic global attributes widely used in climate community are combined with the existing convention in the seismology community, such as CMT and QuakeML, StationXML, SEGY header convention. We also extend such convention by including the provenance and benchmarking records so that the r user can learn the footprint of the data together with its baseline performance. In practise we convert the example wide angle reflection seismic data from SEGY to PH5 or ASDF by using ObsPy and pyasdf libraries. It quantitatively demonstrates how the accessibility can be improved if the seismic data are stored in the HDF container.
|