Figure 1: The NOMAD goals: Develop tools to enable researchers in basic science and engineering to advance materials science, identify new physical phenomena, and help industry to improve existing and develop novel products and technologies.

Data is the raw material for the 21st century.

When NOMAD started, “towards exascale” computing initiatives focused on hardware and software challenges, while extreme-scale aspects of Big-Data remained under-explored, in particular for materials science and engineering. Clearly, much of the value of high-throughput calculations is wasted without deeper analytics of the results. This is where NOMAD came in.

NOMAD, the Novel Materials Discovery Laboratory, is a European Centre of Excellence (CoE) which was established in fall 2015. Eight complementary research groups of highest scientific standing in computational materials science along with four high-performance computing (HPC) centers form the synergetic core of this CoE (see THE TEAM). This also reflects that the CoE is part of the Psi-k, CECAM , and ETSF communities.

NOMAD creates, collects, stores, and cleanses computational materials science data, derived from the most important materials science codes available today. In addition, the NOMAD Laboratory CoE develops tools for mining this data in order to find structure, correlations, and novel information that could not be discovered from studying smaller data sets. Together, the large volume of data and innovative tools will enable researchers in basic science and engineering to advance materials science, identify new physical phenomena, and help industry to improve existing and develop novel products and technologies (Fig. 1).

The Repository developed and maintained by NOMAD is now the largest repository for input and output files of computational materials science worldwide, containing the files from several million high-quality calculations. The volume of files made available through the NOMAD Repository is steadily increasing, as the computational materials-science community uses millions of CPU hours every day in high-performance-computing (HPC) centers worldwide. Importantly, the NOMAD Repository contains data from researchers from all over the world, and from other data bases, e.g. AFLOWlib and OQMD. Unlike other repositories, the NOMAD Repository is not restricted to selected computer codes but serves all important codes currently used in computational materials science. The Repository also helps the computational materials science community to host and organize its data, and to make it available to others in a highly efficient way: See Fig. 2 and watch the 2-minute YouTube movie at https://www.youtube.com/watch?v=L-nmRSH4NQM).

Figure 2: The NOMAD Repository:

• Upload interfaces: Curl, FTP, Python
• Support the most important codes
  in computational materials
• Structure calculations in data sets (folders)
• Share privately with collaborators
• Share anonymously during peer review
• Open Access Sharing:
• DOI support, to link from publication to data
• DOI support, to link from data to publication
• Guaranteed storage for 10 years

As the NOMAD Repository data is generated by many different computer codes, it is heterogeneous and therefore hard to integrate and to use directly for data analytics and extensive comparisons. NOMAD researchers developed ways to convert the existing open-access data of the NOMAD Repository into a common, code-independent format, developing numerous parsers and creating the NOMAD Archive. In this way, NOMAD stands out, compared to other materials-genomics initiatives. Our code-independent Archive enables a leap forward in computational materials science by providing a basis for deeper analytics. In this context, NOMAD has also contributed to data organization by defining metadata to unambiguously label key quantities in the field.

In parallel with creation of the Archive, we started to developing tools to exploit the extensive Archive data, including the NOMAD Encyclopedia, Big-Data Analytics, and Advanced Graphics (Fig. 3).

The NOMAD Encyclopedia represents a user-friendly, public access point to the extensive knowledge contained in the NOMAD Archive. For the first time, we will be able to see, compare, explore, and comprehend computations from international researchers that will help us to understand structural, mechanical, and thermal behaviors of a large variety of materials, their electronic properties, responses to external excitations, and more.

The NOMAD Big-Data Analytics Toolkit will help NOMAD users to identify correlations and structure in the Big-Data of the Archive. This will help scientists and engineers to select which materials will be most useful for specific applications or predict and identify promising new materials with specific sets of desirable properties, worth further exploration.

Seeing helps understanding. Consequently, NOMAD is developing an infrastructure for remote visualization of the multi-dimensional NOMAD data. Our virtual-reality environment will enable interactive data exploration, as well as enhanced training and dissemination. The remote visualization system will allow users to have access to data and tools using standard devices (laptops, smartphones), independent of their location. NOMAD Laboratory CoE users will be able to use our virtual reality software to collaboratively study complex n-dimensional systems in an intuitive way, and pave the way for visual analytics.

High-Performance Computing Expertise and Hardware enable the NOMAD Laboratory CoE to meet the demands of the Encyclopedia, Big-Data-Analytics Toolkit, and Visualization tools by design and operation of the underlying computing platform for the NOMAD services, as well as application support for both HPC and Big-Data analytics and corresponding workflows. Through the NOMAD Laboratory CoE, academic and industrial users alike will be able to leverage European HPC capabilities by gaining access to meaningful, useful presentations of computational materials science data already computed by HPC centers and by using the HPC resources that support delivery of NOMAD tools and services.

Figure 3: The Archive contains the open-access, code-independent data and is he bases for novel big-data analytics tools, and extensive Encyclopedia, and advanced graphics. All this is enabled by the experience and hardware of our high-performance Computing Centers.

NOMAD also performs high-quality calculations for materials where important information is missing in the Archive. We are carefully listening to suggestions from our industrial colleagues about the most needed calculations. For example, as requested by Siemens, a novel thermal-conductivity calculation approach has been developed, which for the first time enables accurate calculations for materials from very low to very high thermal conductivity. Systematic calculations of heat-transport tensors for many materials will be started soon. I-deals, a company coordinating the Methanol fuel from CO2 (MefCO2) project, is interested in the catalytic activation of CO2, which is presently being studied by the NOMAD team to examine various potential catalyst materials, e.g. carbides and oxides. We will also perform studies to develop thin coating films to protect novel hybrid perovskite solar cells from degradation in moist environments, perform systematic high-throughput screening of potential transparent oxide semiconductors, and more.

The data and tools of the NOMAD Laboratory CoE will be made freely available to anyone wishing to use them. The Materials Encyclopedia web interface and API will soon be available through the project website, and a number of Big-Data Analytics tutorials are already available. Videos showcasing our virtual reality environment are also available now on the website. In addition, we will make available APIs to facilitate data downloading.

To ensure that the NOMAD Laboratory CoE achieves maximum impact and benefit, we are conducting extensive outreach to industrial and academic end-users. We have hosted an Academic Workshop and two Industry Meetings, with a third Industry Meeting planned for 05 - 06 Feb 2018. We have also conducted numerous Industry Interviews and will continue to seek industry feedback on our tools and services. In 2017, we are organizing an Academic Workshop and a Summer School, open to both industry and academia, in collaboration with the Psi-k and CECAM networks.

 

contact concerning general aspects of the CoE: Kylie O'Brien

contact concerning the NOMAD Encyclopedia: Georg Huhs

contact concerning Big-Data Analytics: Luca Ghiringhelli

contact concerning Advanced Visualization: Rubén García Hernández

contact concerning HPC Infrastructure: Atte Sillanpää

contact concerning Outreach: Kylie O'Brien