|Figure 1: NOMAD goals: Develop tools to enable researchers in basic science and engineering to advance materials science, identify new physical phenomena, and help industry to improve existing products as well as develop novel products and technologies.|
Data is a crucial raw material of the 21st century.
Surprisingly, extreme-scale aspects of Big-Data are very much under-explored in materials science and engineering, one reason being that ‘towards exascale’ computing initiatives typically focus on standard hardware and software challenges. Clearly, much of the value of high-throughput calculations is wasted without deeper Big-Data driven analysis of the results. This is the extreme-scale computing challenge addressed by NOMAD, the Novel Materials Discovery Laboratory.
NOMAD creates, collects, stores, and cleanses computational materials science data, computed by the most important materials-science codes available today. Most important the NOMAD Laboratory CoE develops tools for mining this data in order to find structure, correlations, and novel information that could not be discovered from studying smaller data sets. Together, the large volume of data and innovative tools will enable researchers in basic science and engineering to advance materials science, identify new physical phenomena, and help industry to improve existing and develop novel products and technologies (Fig. 1).
The Repository developed and maintained by NOMAD is now the largest repository for computational materials science worldwide, containing the input and output files from several million high-quality calculations. The volume of files made available through the NOMAD Repository is steadily increasing. In fact, the computational materials-science community uses millions of CPU hours every day in high-performance-computing (HPC) centers worldwide. Importantly, the NOMAD Repository contains data from researchers from all over the world, and from other data bases, e.g. AFLOWlib and OQMD. Unlike other repositories, the NOMAD Repository is not restricted to selected computer codes but serves the entire community by supporting all important codes currently used in computational materials science. The NOMAD Repository also helps researchers to host and organize their data, and to make it available to others in a highly efficient way: See Fig. 2 and watch the 2-minute YouTube movie at https://youtu.be/UcnHGokl2Nc.
Figure 2: Features of the NOMAD Repository:• Upload interfaces: Curl, FTP, Python
• Support the most important codes
in computational materials
• Structure calculations in data sets (folders)
• Share privately with collaborators
• Share anonymously during peer review
• Open Access Sharing:
• DOI support, to link from publication to data
• DOI support, to link from data to publication
• Guaranteed storage for 10 years
As the NOMAD Repository data is generated by many different computer codes, it is heterogeneous and therefore hard to integrate and to use directly for data analytics and extensive comparisons. NOMAD researchers developed ways to convert the existing open-access data of the NOMAD Repository into a common, code-independent format, developing numerous parsers and creating the NOMAD Archive. In this way, NOMAD stands out, compared to other materials-genomics initiatives. Our code-independent Archive enables a leap forward in computational materials science by providing a basis for analysis and thus deeper insight. In this context, NOMAD has also contributed to data organization by defining metadata to describe computed quantities and, hence, unambiguously label them.
In parallel with the creation the Archive, we started to develop tools to exploit the extensive Archive data, resulting in the NOMAD Encyclopedia, Big-Data Analytics, and Advanced Graphics (Fig. 3).
The NOMAD Encyclopedia represents a user-friendly, public access point to the extensive knowledge contained in the NOMAD Archive. For the first time, we are able to see, compare, explore, and comprehend computations from international researchers. This helps us to understand structural, mechanical, and thermal behaviors of a large variety of materials, their electronic properties, responses to external excitations, and more.
The NOMAD Big-Data Analytics Toolkit help NOMAD users to identify correlations and structure in the Big-Data of the Archive. This enables scientists and engineers to select which materials will be most useful for specific applications or predict and identify promising new materials with specific sets of desirable properties, worth further exploration.
Seeing helps understanding. Consequently, NOMAD is developing an infrastructure for remote visualization of the multi-dimensional NOMAD data. Our virtual-reality environment enables interactive data exploration, and enhances training and dissemination. The remote visualization system will allow users to have access to data and tools using standard devices (laptops, smartphones), independent of their location. NOMAD Laboratory CoE users are able to use our virtual reality software to collaboratively study complex n-dimensional systems in an intuitive way, and pave the way for visual analytics.
High-Performance Computing Expertise and Hardware not only enable the NOMAD Laboratory CoE to meet the demands of the Encyclopedia, Big-Data-Analytics Toolkit, and Visualization tools by design but also operation of the underlying computing platform for the NOMAD services, as well as application support for both HPC and Big-Data analytics and corresponding workflows. Through the NOMAD Laboratory CoE, academic and industrial users alike will be able to leverage European HPC capabilities by gaining access to meaningful, useful presentations of computational materials science data already computed by HPC centers and by using the HPC resources that support delivery of NOMAD tools and services.
|Figure 3: The NOMAD Archive contains the open-access, code-independent data and is the base for novel Big-Data Analytics tools, the NOMAD Encyclopedia, and Advanced Graphics. All this is enabled by the experience and hardware of our High-Performance Computing Centers.|
NOMAD also performs high-quality calculations for materials where important information is missing in the Archive. We are carefully listening to suggestions from our industrial colleagues about the most needed calculations. For example, as requested by Siemens, a novel thermal-conductivity calculation approach has been developed, which for the first time enables accurate calculations for materials from very low to very high thermal conductivity. Systematic calculations of heat-transport tensors for many materials will be started soon. I-deals, a company coordinating the methanol fuel from CO2 (MefCO2) project, is interested in the catalytic activation of CO2, which is presently being studied by the NOMAD team to examine various potential catalyst materials, e.g. carbides and oxides. We will also perform studies to develop thin coating films to protect novel hybrid perovskite solar cells from degradation in moist environments, perform systematic high-throughput screening of potential transparent oxide semiconductors, and more.
The data and tools of the NOMAD Laboratory CoE will be made freely available to anyone wishing to use them. The Materials Encyclopedia web interface and API are available through the project website, and a number of Big-Data Analytics tutorials are already available. Videos showcasing our virtual reality environment are also available now on the website. In addition, we make available APIs to facilitate data downloading.
To ensure that the NOMAD Laboratory CoE achieves maximum impact and benefit, we are conducting extensive outreach to industrial and academic end-users. We have hosted Academic Workshops, a summer school (NOMAD Summer) and Industry Meetings, with more events planned in the coming months, including NOMAD Summer 2018. We have also conducted numerous Industry Interviews and will continue to seek industry feedback on our tools and services.
NOMAD, is a European Centre of Excellence (CoE) which was established in fall 2015. Eight complementary research groups of highest scientific standing in computational materials science along with four high-performance computing (HPC) centers form the synergetic core of this CoE (see THE TEAM). This also reflects that the CoE is part of the Psi-k, CECAM, and ETSF communities.
The initial funding was provided by the EU as a research grant from the European Union's Horizon 2020 research and innovation program in the time from November 2015 until October 2018. Since September 2018, some components of NOMAD - mainly the Repository, the Archive, the Metadata, and the Encyclopedia - are part of the association "FAIR Data Infrastructure for Physics, Chemistry, Materials Science, and Astronomy e.V." (FAIR-DI e.V., https://fairdi.eu), a non-profit association based in Germany.
contact concerning general aspects of the CoE: Jessica Pietsch
contact concerning the NOMAD Encyclopedia: Georg Huhs
contact concerning Big-Data Analytics: Luca Ghiringhelli
contact concerning Advanced Visualization: Rubén García Hernández
contact concerning HPC Infrastructure: Atte Sillanpää
contact concerning Outreach: Jessica Pietsch