The NOMAD Archive stores - in a code-independent format - calculations performed with all the most important and widely used electronic-structure and force-field codes.
Summary statistics of the Archive content, updated June 1st, 2017:
- 3752 Zip Archives for parsing: 8 TB of data (compressed)
- Data extracted with parsing: 2.7 TB of HDF5 files (compressed)
- Data classified using 434 public metadata of the NOMAD Meta Info and 1736 code-specific metadata
Below, we show a breakdown of the global quantities into some of the codes supported by NOMAD, with number of different composition and average number of different geometries per composition (updated Feb 14th, 2017).
Another useful statistics is shown below, with the same vertical axes as above, but with a detail of the Archive content in terms of level of theory used in the stored calculations (updated Feb 14th, 2017).
(MM = Molecular Mechanics, DFT = Density-Functional Theory, DFT+U = DFT with additional Hubbard-like term to treat the strong on-site Coulomb interaction of localized electrons, TDDFT = Time-Dependent DFT, MP2 = Møller-Plesset second-order perturbation theory, CC = Coupled-Cluster family of methods, GW = family of methods for the approximation of the self energy in terms of the single particle Green's function G and the screened Coulomb interaction W, MR = Multi-Reference family of methods)
The code independent data is described using NOMAD Meta Info, an open, flexible, and hierarchical metadata classification system that we developed and to which anybody can contribute. The NOMAD Meta Info aims at defining a conceptual model to store the values connected to atomistic or ab initio calculations. A clear and usable metadata definition is a prerequisites to preparing the data for analysis.
In collaboration with the Berlin Big Data Center (BBDC), we use the Apache Flink infrastructure to support and go beyond the standard MapReduce model to enable rapid and complex queries.
contact concerning general aspects of the CoE: Kylie O'Brien