Pillar 2 - Exascale Workflows
Bringing workflows to exscale
While the exascale-ready libraries and codes that are the focus of Pillar 1 will be an important step forward, their efficient and effective use in high-throughput computations will require sophisticated workflows that are also capable of managing and taking full advantage of exascale resources. Todays' workflows were developed to allow for the execution of a few thousand ‘standard’ density-functional theory (DFT) computations with one specific code. In contrast, the workflows of tomorrow, meeting the transition to exascale computing and considering advanced methodology, require extensive advancement of the current workflow software.
To bring HTC in aiCMS to the exascale we will
- develop an exascale workflow engine that supports and embraces all important aiCMS codes. Building on the developments of Pillar 3, this engine will also integrate high performance artificial intelligence tools
- identify and resolve bottlenecks (e.g. in I/O and communication) and scale-up the HTC workflows to meet the new exascale challenges
- develop workflows for more complex simulations (e.g. including finite temperature molecular dynamics with large unit cells for finding new thermoelectric materials together with WP9), and the beyond-DFT methods of Pillar 1 for significantly increased accuracy.
The NOMAD CoE team is deeply involved in ASE and FireWorks, and will link these approaches and bring them to exascale performance. The exascale-workflow engine developed in this project will be compatible with all the commonly-used ab initio codes, so that the same workflows can be shared and reused by the entire aiCMS community. Our exascale workflows will manage complex materials, ab initio molecular dynamics and advanced beyond-DFT calculations exploiting the libraries and methods developed in Pillar 1. They will also form the basis for the use-case demonstrations on computational discovery of novel materials for thermoelectric energy conversion and sustainable hydrogen production and will support the use of AI for guiding the selection of materials to be explored. Such high-throughput studies are challenging with petascale computers, but will become feasible with exascale technology, enabling a revolution in accuracy and predictability of computational materials design. As with all NOMAD developments, the workflow engine will be open and inclusive and will be coordinated with other activities.
Brief description of the WPs involved in Pillar 2
Work Package 4: Exascale High-throughput Workflows
This WP will develop an exascale workflow environment for HTC. This includes the development of a library of HTC workflows, LibFlow-X, that can perform various types of ab initio tasks, using any of the aiCMS codes. The developments will be based on the widely-used workflow management tools FireWorks and ASE, which will be brought together and extended to meet the challenges posed by exascale computing. This WP will also form the basis for the beyond-DFT workflows developed in WP5 and the use cases in WP9 (from which feedback to this WP enables refinement). Although the focus of WP4 lies on ab initio calculations, the developed libraries will be general enough to be useful for second-principles frameworks as well, e.g., for classical molecular-dynamics simulations, as it is the case for ASE already.
Work Package 5: Beyond-DFT Workflows
The main goal of this work package is to develop and validate workflows for beyond-DFT calculations so that such calculations can take full advantage of exascale computers and distribute the many tasks required to obtain converged results onto many thousand cores. The methodologies covered will include the random-phase approximation (RPA), Møller–Plesset perturbation theory (MP2), the coupled cluster (CC) approach, and the GW approximation, i.e. the same methods that are the focus of WP2-3. Since beyond-DFT calculations often require much more memory than standard DFT calculations, special care will be given to automate and detect memory bottlenecks prior to submission of the jobs. Although similar problems can occur in DFT calculations, beyond- DFT calculations are much more demanding in terms of compute requirements, with the likelihood of failures during execution due to architectural bottlenecks (memory, communication-limited time-to-solution) being orders of magnitude larger than for DFT. This WP will rely on the LibFlow-X library (WP4) for setting up, post-processing the computations and managing the workflow aspects (including convergence loops and restarting failing computations).