HDF5 format for wave function data
From QMC
Contents |
[edit] Other related formats
- Nanoquanta and ETSF specifications for I/O for PW codes using NetCDF
- Q5Cost for Gaussian-based Quantum chemistry codes
[edit] Basic strategy
- Same for serial and parallel executions.
- H5Group represents a physical entity or concept.
- Multiple instances of a datatype are represented by H5Group and enumerated. E.g., state_0, ..., state_99 for 100 states.
- Avoid using high-dimensional arrays.
- High-dimensionality due to physical concepts, such as spin, band and k-point, should be handled by HDF5 Group.
- Different codes use different index schemes ( band index, spin index, k-point etc) but the fast index in any language and any reasonable code means the same: 3D grid with three indexes or plane-wave vectors with one index.
- Repacking to a higher-dimensional array is inefficient (CPU and memory).
- Tools and libraries can handle any indexing scheme without requiring modification in application codes.
- Visualization tools can display data easily if an array is mapped on a regular grid.
[edit] HDF5 storage format
How to edit the format using MediaWiki Template
- Adding a group using Template:h5group
{{h5group|group name}}
- Adding a dataset using Template:h5ds
{{h5ds|dataset name|data type | dimensions| O(optional)/M(andatory)|comments}}
We used the names adopted by NQ NetCDF files, if we can identify one-to-one mapping. Since a single HDF5 can handle the entire data sets, my_ or max_ prefix is not used. Redundant prefixes, e.g., atom_ for the data in atoms group, are removed.
Some names with multiple definitions are simplified , e.g., reduced_gvectors of an integer type is replaced by gvectors.
[edit] Draft
[edit] Definitions
[edit] Missing
- Units: bohr, Rydberg/Hartree ...
- The number of up or down spins
- Or, spin-state, singlet, triplet?
- Wannier and localized orbitals in general
[edit] Comments
- number_of_something can be num_something or size_something
- spins or components?
- dataset for the number of something is not necessary, since all the dimensions are available in the dataset of array types.
- NQ structure: spin owns states, while state owns spins in the draft.
- gvectors is defined per state.
- reduced_gvectors is replaced by gvectors.
[edit] How to implement the formats
- Applications can use HDF5 API directly. This is quite simple for the applications that use the file, like QMC codes.
- A library with the APIs to hide HDF5 low-level functions from the developers.
- This is the model adopted by Q5cost.
- Written in C and has C and Fortran77 bindings.
- Does not manage memory. Temporary memory allocations to pack/unpack data are handled in a function.
- Provides both atomic and high-level APIs. But, the main efforts should be spent on making the atomic calls as complete as possible.
- Define the convention how to name the APIs so that missing data can be added. This can be used to automatically generate the function names like get/set functions in Java.
| Using atomic calls | Using high-level calls |
esh5_open_electrons(number_of_electrons);
for(int i=0; i<number_of_spins; ++i)
{
esh5_open_spin(i,number_of_kpoints);
for(int k=0; k<number_of_kpoints; ++k)
{
esh5_open_kpoint(k);
esh5_get_eigenvalues(eigenvalues);
esh5_get_occupations(occupations);
for(int s=0; s<number_of_states; ++s)
{
esh5_get_psi_r(s,psi_r[s],nx,ny,nz);
}
esh5_close_kpoint();
}
esh5_close_spin();
}
esh5_close_electrons();
|
esh5_open_electrons(number_of_electrons);
for(int i=0; i<number_of_spins; ++i)
{
esh5_open_spin(i,number_of_kpoints);
for(int k=0; k<number_of_kpoints; ++k)
{
esh5_get_states_qmc(eigenvalues,occupations,psi_r_all[k],
k,number_of_states, nx, ny, nz);
}
esh5_close_spin();
}
esh5_close_electrons();
|
| |



