StandardizationCriterion
From QMC
In this page, we discuss criterion that we feel are important for standarized data formats and we give a specific example of a format that fullfills (most of) these criterions that we use in our group in the code PIMC++.
We feel that the following are two key requirements for standard data formats:
1) They naturally allow for hiearchy (i.e. There exist sections. There can be sections inside sections and variables inside sections.)
2) There exists a natural way to build an extremely simple interface that reads/writes this data and can be built into a library. An agreement on the interface to these libraries can be seen as a key important step. If the library reads and writes the data sanely then there is a sense in which how the data is stored is unimportant (and in fact, one group could store it in xml and another in txt and another in hdf5, etc.) We have in mind a library that has the ability to open sections in the hiearchy, close them, read variables, write variables, and append to variables. Ideally this library reads/writes to the hdf/ascii/xml all in the same way.
There are a number of other attributes that we think would be convenient for a standard data format for data to have:
1) The input/output file is well-typed. (input/output in hdf5 gives this for free) In other words, a piece of code that is reading the input file is able to identify the type of anything it reads. Ideally, it is able to do this from the information in the file without having to have an external mapping in the code between variable names and types (i.e. it doesn't have to know that the variable named tau is always a double). We feel this is important for a number of reasons. To begin with it allows quick conversion between files of different types (i.e hdf5 and ascii) from a generic program with no knowledge of the variables. It also allows for a sanity check on the input.
2) The input is human readable. We (some of us) find that this is our major complaint against xml. Some of us find that editing xml by hand and understanding it can be confusing and annoying. Of course, there can always be a front-end input that converts into a back-end xml but the advantage of this is unclear.
3) The input and output file are constructed in the same way. This allows the output of one program to be used as the input of another.
4) ...
A note (by some of us) on xml: [Ken, feel free to comment as desired]
The input/output format used by PIMC++:
Here is an examples PIMC++ InputExample. The output, although typically stored in hdf5, would also "look" like this if the output was written in ASCII (which would simply involve changing the output file name from ".txt" to ".hdf5"
All input/ouput files consist of "Sections" (which can be nested) and variables inside these sections.
The library to access these input/output files works as follows:
Let us say you define an IOSectionClass object IO.
All functions return true if successful and false otherwise.
For reading data (in any format (hdf5/txt/xml/etc) that the format is stored in: To open the file you call: IO.OpenFile("FileName") To open a section you call: IO.OpenSection("SectionName") To close a section you call: IO.CloseSection() To read a variable you call: IO.ReadVar("VariableNameInFile",variable). Because the file is well typed this will only succeed if variable is of the correct type. Scoping works in the following way. Variables are read from the current section. If the variable is not in the current section, reading is attempted from the section that contains the current section (ad infinitum). To close a file you call: IO.CloseFile() There exists some functionality for doing "include" files but we won't talk about that right hti second
For writing data (in any format (hdf5/txt/xml/etc) that the format is stored in:
To create a new file you call: IO.NewFile("FileName")
To create a new section you call: IO.NewSection("SectionName")
To close a section you call: IO.CloseSection()
To write a variable you call: IO.WriteVar("VariableNameInFile",variable).
To append to a variable (i.e. you want to add more data to an array,etc.) you must "get" the variable you want to append to). You do this by doing IOVarClass IOVar=GetVar("VariableNameInFile"). The to actually append call IOVar.AppendVar(newVarData). In our code, we have abstracted this away so calls to a higher level WriteVar will write the variable if it's not there or otherwise append to it.
To close a file you call: IO.CloseFile() There exists some functionality for doing "include" files but we won't talk about that right thi s second.
