FAIR-DI e.V.
FAIRmat
NOMAD Laboratory
NOMAD CoE

Exercise 1: NOMAD Repository and Archive - Sharing, Publishing, and Managing Computational Materials Science Data with NOMAD

Time: November 12, 2020
NOMAD Developer: Markus Scheidgen

Description

The main focus of this first exercise is the FAIR sharing of materials science data and how to do it with NOMAD.
We will be covering the publication of new data and the exploration and download from NOMAD’s existing data; both through our browser-based interface and API.

In this exercise, you will learn how to publish data, this includes:

  • preparing data uploads, 
  • performing uploads,
  • supervising how NOMAD processes your data, 
  • how to add additional metadata, curate datasets,
  • how to acquire a DOI for your data.

Furthermore, we will

  • explore NOMAD’s extensive metadata and use it to create complex searches with our graphical interface, powerful interactive search bar, and the Optimade filter language;
  • explore how NOMAD presents data in its raw (Repository) and its processed normalized form (Archive).
  • look at various options to use NOMAD’s API to automate data exploration and download. 
  • introduce NOMAD’s Python package to programmatically access NOMAD’s data or to use NOMAD’s data processing locally.

Videos and Exercises

Introduction

Before you start to watch the other videos and do the exercises, we recommend to watch this introduction.
It will give you a brief overview of NOMAD, its various parts and functionality as well as some nomenclature.

The following four videos and exercises will cover certain aspects of the NOMAD Repository and Archive. You should do them in order as later videos rely on information given in prior videos. You can watch all videos at any time and do the exercises at your own leisure. We also have a playlist with all videos on YouTube.
On the 12th November I'll be available in the zoom for questions, discussions, or feedback. You can also write an email at any time to markus.scheidgen@physik.hu-berlin.de.

Search

We will begin with NOMAD's search interface. This video will demonstrate NOMAD's graphical web-based user interface.

This is the link to the search page of the official NOMAD: https://nomad-lab.eu/prod/rae/gui/search

Google's datasearch: https://datasetsearch.research.google.com/

Exercises

  1. Explore the various tabs for elements, system, method, properties, and upload metadata! For example, you could filter for titanium oxides performed with a specific code like VASP that contain density of states (DOS) data. Observe the changing amount of remaining entries when you set criteria!
  2. Use visualisations and various metadata tabs to create a search for GaAs! What are the most commonly computed symmetries and used codes?
  3. After you set a view criteria, explore the various options on the list of results!
  4. Click on results to see details on the metadata! Use the ... button to go to the entry page. Look at the raw-files; preview or download files! Explore the other tabs on the entry page to go to the Archive!
  5. Try to change the columns on the results list!
  6. Select a few entries and download them!
  7. Use the search bar to set search criteria! Try to use atoms= to filter for certain elements! Other quantities you should try are xc_functional_name=, spacegroup=, or code_name=.
  8. Use the optimade filter language. An example would be: elements HAS ALL "Ti", "O" AND nelements > 4!
  9. You can vary the elements clause with the operator HAS ANY.
  10. Try to combine clauses with AND, OR, and NOT!
  11. Use optimade to find all four-element compounds that contain Carbon and Hydrogen, but not Oxygen!
  12. Modify the statistics by choosing different metrics!
  13. Show the available datasets with data that fit your search! Go to a dataset and see that you can further search with the same interface within a dataset!

Upload

Here we learn how to upload data to NOMAD and how to publish the uploaded data. We are covering 5 steps:

  1. preparing files
  2. uploading
  3. reviewing the upload
  4. adding user metadata
  5. publishing and creating a DOI

You can use our test installation of NOMAD, if you want to publish data and create DOIs. Everything you do here, we will later delete again: https://nomad-lab.eu/prod/rae/test/gui/uploads

You can use the official NOMAD, unless you want to publish data or create a DOI (don't forget to delete your data again): https://nomad-lab.eu/prod/rae/gui/uploads

This is some example data you can use: https://www.dropbox.com/s/iadh4pxfepl9b5h/tutorial_files.zip?dl=0

Exercises

  1. Create a NOMAD account. You can use any NOMAD link; they are all connected to the same user management.
  2. Upload the example data! Compare the files with the entries that were created.
  3. Delete the upload!
  4. Upload the example using the command line and curl!
  5. Select an entry, look at the raw files, archive, and logs tab!
  6. Select a few entries and edit them! You can set comments, references, and co-authors.
  7. Use this method to create a dataset!
  8. Go to "Your data" and use the search interface to explore the uploaded data! Do you see your dataset?
  9. Publish the upload! Only do this on the test installation.
  10. Find the dataset that you created on the "Your data" page and assign a DOI! Only do this on the test installation.

API

Here we learn how to use NOMAD's API. In the video I am going through a few scenarios and touch some of the options in using the API.

This is the interactive API dashboard: https://nomad-lab.eu/prod/rae/api/

Here are more examples and tutorials on how to use the API as part of or documentation: https://nomad-lab.eu/prod/rae/docs/api_tutorial.html

The NOMAD Python package on the Python Package Index (pypi): https://pypi.org/project/nomad-lab/

Exercises

  1. Use the /repo/ endpoint directly in your browser and perform a search for AsGa VASP calculations!
  2. Use the /repo/<upload_id>/<calc_id> endpoint to directly access an individual calculation!
  3. Authenticate in the API dashboard and perform a search with owner=user to see you own data! If you used the NOMAD test installation, the base-url of the API (and dashboard) would be https://nomad-lab.eu/prod/rae/test/api/.
  4. Use curl to download the dataset "NOMAD webinar"!
  5. Write a small python script to access the API with requests! See also this documentation!
  6. Install the NOMAD Python package and use it to access the Archive! See also this documentation!

Local processing

Here we learn how to install the NOMAD Python package and how to parse and process locally on your computer. This will allow you to use NOMAD's parsers without NOMAD. This will also enable you to improve the parsers and make contributions to NOMAD.

Link to the GitLab of the NOMAD infrastructure source code: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR

Link to the GitLab of NOMAD's VASP parser: https://gitlab.mpcdf.mpg.de/nomad-lab/parser-vasp

The NOMAD Python package on the Python Package Index (pypi): https://pypi.org/project/nomad-lab/

Some example data you can use: https://www.dropbox.com/s/iadh4pxfepl9b5h/tutorial_files.zip?dl=0

Exercises

  1. Install the NOMAD Python package with parsing extension!
  2. Parse some of the example files on your computer!
  3. Clone the VASP source code. Modify it and see the results by parsing a VASP calculation.

Getting Help

If you have any questions, remarks, or topics that you want to discuss, there is this NOMAD forum on matsci.org that you can use to ask.