HDF5 format for wave function data

From QMC

Jump to: navigation, search

Why HDF?

Contents

[edit] Other related formats

[edit] Basic strategy

  • Same for serial and parallel executions.
  • H5Group represents a physical entity or concept.
  • Multiple instances of a datatype are represented by H5Group and enumerated. E.g., state_0, ..., state_99 for 100 states.
  • Avoid using high-dimensional arrays.
    • High-dimensionality due to physical concepts, such as spin, band and k-point, should be handled by HDF5 Group.
    • Different codes use different index schemes ( band index, spin index, k-point etc) but the fast index in any language and any reasonable code means the same: 3D grid with three indexes or plane-wave vectors with one index.
    • Repacking to a higher-dimensional array is inefficient (CPU and memory).
    • Tools and libraries can handle any indexing scheme without requiring modification in application codes.
    • Visualization tools can display data easily if an array is mapped on a regular grid.

[edit] HDF5 storage format

How to edit the format using MediaWiki Template

{{h5group|group name}}
{{h5ds|dataset name|data type | dimensions| O(optional)/M(andatory)|comments}}

We used the names adopted by NQ NetCDF files, if we can identify one-to-one mapping. Since a single HDF5 can handle the entire data sets, my_ or max_ prefix is not used. Redundant prefixes, e.g., atom_ for the data in atoms group, are removed.

Some names with multiple definitions are simplified , e.g., reduced_gvectors of an integer type is replaced by gvectors.

[edit] Draft

[edit] Definitions

math

math

[edit] Missing

  • Units: bohr, Rydberg/Hartree ...
  • The number of up or down spins
  • Or, spin-state, singlet, triplet?
  • Wannier and localized orbitals in general

[edit] Comments

  • number_of_something can be num_something or size_something
  • spins or components?
  • dataset for the number of something is not necessary, since all the dimensions are available in the dataset of array types.
  • NQ structure: spin owns states, while state owns spins in the draft.
    • gvectors is defined per state.
  • reduced_gvectors is replaced by gvectors.

[edit] How to implement the formats

  • Applications can use HDF5 API directly. This is quite simple for the applications that use the file, like QMC codes.
  • A library with the APIs to hide HDF5 low-level functions from the developers.
    • This is the model adopted by Q5cost.
    • Written in C and has C and Fortran77 bindings.
    • Does not manage memory. Temporary memory allocations to pack/unpack data are handled in a function.
    • Provides both atomic and high-level APIs. But, the main efforts should be spent on making the atomic calls as complete as possible.
    • Define the convention how to name the APIs so that missing data can be added. This can be used to automatically generate the function names like get/set functions in Java.
Using atomic calls Using high-level calls
esh5_open_electrons(number_of_electrons);
for(int i=0; i<number_of_spins; ++i)
{
  esh5_open_spin(i,number_of_kpoints);
  for(int k=0; k<number_of_kpoints; ++k)
  {
    esh5_open_kpoint(k);
    esh5_get_eigenvalues(eigenvalues);
    esh5_get_occupations(occupations);
    for(int s=0; s<number_of_states; ++s)
    {
      esh5_get_psi_r(s,psi_r[s],nx,ny,nz);
    }
    esh5_close_kpoint();
  }
  esh5_close_spin();
}
esh5_close_electrons();
esh5_open_electrons(number_of_electrons);
for(int i=0; i<number_of_spins; ++i)
{
  esh5_open_spin(i,number_of_kpoints);
  for(int k=0; k<number_of_kpoints; ++k)
  {



    esh5_get_states_qmc(eigenvalues,occupations,psi_r_all[k],
        k,number_of_states, nx, ny, nz);



  }
  esh5_close_spin();
}
esh5_close_electrons();
  1. prefix esh5_ is used to avoid any name conflict with existing functions.
  2. Function arguments will follow the convention of other common libraries such as blas and mpi.
  3. The library APIs build the tree and assign the names.
  4. High-level APIs may have many variants, which will burden the library. This should be left to the developers of user applications.
  5. A postfix can be used to distinguish different high-level APIs for a similar function. Each code can add its own API which calls the existing APIs.
Personal tools