The NOMAD Meta Info stores descriptive and structured information about materials-science data (called metadata) contained in the NOMAD Archive. The NOMAD metadata structure is formatted to be independent of the electronic-structure theory or molecular-simulation code and can be browsed at the NOMAD Meta Info page.
Typically metadata is generated only for a predesigned and specific scientific field, application or code. In contrast, the NOMAD Meta Info considers all pertinent information in the input and output files of the supported electronic-structure theory, quantum chemistry, and molecular-dynamics (force-field) codes. This ensures a complete coverage of all material and molecule properties, even though some properties might not be as important as others, or are missing in some input/output files of electronic-structure programs.
The metadata are described using the notion of multiple inheritance, i.e., an object or class can inherit characteristics and features from more than one parent object or parent class. They can have attributes. For example, the name is an attribute, which is used to identify the metadata. The metadata type is also an important attribute. Concrete values, scalars, strings, or (multidimensional) lists, which are extracted by parsers of input and output files, have an associated Concrete Value-type metadata (see the short metadata info guide for detailed information about attributes and types of metadata). These values are organized in hierarchical groups that are associated with Section-type metadata (see Fig. 1). Sections can have references to other sections either through their nesting or through explicit references (denoted by the dashed arrows in Fig. 1), i.e, the section section_single_calulation_configuration refers to section_method and section_system. This way the model is equivalent to a relational model where Sections are tables, and Concrete Values are columns.
NOMAD Meta Info is kept independent of the actual storage format and is not bound to any specific storage method. In our practical implementation, we support JSON and HDF5 file formats. JSON is a language-independent human-readable data format, whereas HDF5 is a binary format that can efficiently store large arrays and high-dimensional objects. There is ongoing work to support CIF as a serialization format in the near future.
The NOMAD Meta Info started within the NOMAD Laboratory. It was discussed at the CECAM workshop Towards a Common Format for Computational Materials Science Data and is open to external contributions and extensions.
Towards a Common Format for Computational Materials Science Data (Psi-K 2016 Highlight) provides a description on how to establish code-independent formats in detail and presents the challenges and practical strategies for achieving a common format for the representation of computational material-science data.
The Novel Materials Discovery Laboratory - Data formats and compression, D1.1 outlines possible data formats, concepts, and compression techniques used to build a homogeneous (code-independent) data archive, called the NOMAD Archive.