HLRS will help build a national data research infrastructure for catalysis research

October 12, 2020 — A consortium comprising the Stuttgart High Performance Computing Center (HLRS) received a grant of more than 10 million euros from the Deutsche Forschungsgemeinschaft (DFG) to create a national research data infrastructure for catalysis-related sciences (NFDI4Cat). The consortium, led by the nonprofit chemical company DECHEMA (Gesellschaft für Chemische Technik und Biotechnologie eV) and involving representatives from 15 additional partner institutions, will develop infrastructure, software and data management standards to enable the next generation of chemical engineering research. NDFI4Cat is one of nine new consortia that will help build a German national research data infrastructure.

As one of four members of the NFDI4Cat coordination group, HLRS will create and host a data repository for catalysis-related research, including a portal for sharing and accessing data stored in multiple locations. Additionally, HLRS will play a major role in an effort to establish standardized metadata and ontologies for catalysis research that will ensure compatibility between different datasets, increasing their usability and amplifying their potential impact for scientific advancement.

“We are very pleased that HLRS is participating in the development of a national research data infrastructure,” said HLRS Director Prof. Dr.-Ing. Michael Resh. “Together with partners in the catalysis research community, this project should offer exceptional opportunities to accelerate research in an area that is not only of great economic importance, but also holds keys to addressing certain of our greatest global challenges.”

Turning Catalysis into Computational Science

Catalysis and chemical engineering are essential disciplines for producing many materials that we use in our daily lives. They also offer the possibility of solving some of humanity’s most pressing problems. Developing new technologies to reduce CO2 emissions, avoid plastic waste or produce sustainable fertilizers to meet the nutritional needs of a growing world population, for example, are all areas where catalysis and chemical engineering have roles. important to play.

Catalysis research, like many scientific fields, increasingly relies on computational methods that support a continuous dialogue between theory, simulation and experiment. Based on rapidly growing high-throughput experimental data collections, data science methods can be used to predict the relationships between the chemical structures of catalysts and their activities. At the same time, simulation can provide valuable insights for optimizing the design of reactors and chemical processes. As data accumulates, the ability to integrate knowledge from different disciplines, examining all levels of catalytic reactions – from the physical chemistry of individual molecules to process engineering – would also provide fundamental knowledge that would be of great use to researchers around the world. field.

NFDI4Cat provides a distributed data infrastructure using a concept of shared metadata that allows access to data repositories from a central portal. (Image courtesy of NFDI4Cat)

However, to harness the full potential of this trend, new types of computing infrastructure and methods are needed. Although the use of data science in catalysis research has increased, scientists too often work in relative isolation from one another. The result is that data is generally collected in proprietary formats, is not organized using a standardized metadata description, is not saved in places where it is accessible to other researchers, and is not linked to related publications and published datasets. The NFDI4Cat project aims to solve these problems by creating a shared and comprehensive framework for the sharing and management of catalysis research data.

Establish catalysis metadata standards

Between 2017 and 2019, HLRS contributed significantly to a research project called DIPL-ING, which has developed a computer engineering research data management system. The result was a metadata model called EngMeta, which HLRS and the University of Stuttgart Library now use. Additionally, the researchers developed a method called ExtractIng that could automatically extract metadata from research datasets and turn it into EngMeta. This approach relieves investigators of often tedious and time-consuming work, an objective that will be important for the success of the NFDI4Cat project.

In this new project, HLRS will build on EngMeta to create an ontology – a set of categories for organizing the knowledge contained in all relevant datasets – for data management in catalysis research. Such an ontology would include metadata that generally describes the data, technical metadata about the data objects contained in the dataset, process metadata describing the methods and experimental or computational hardware used to generate the data, and data domain-specific related to the specific area of ​​research in which the data was generated.

The researchers of the NFDI4Cat consortium plan to integrate the ontology into two other complementary software platforms: Piveau, an open source data management ecosystem developed at FOKUS (Fraunhofer Institute for Open Communication Systems), and CaRMen, a software developed at the Karlsruhe Institute. of Technology for the analysis of physical and chemical models in relation to experimental data.

Ultimately, by standardizing metadata frameworks, this project aims to align catalytic research data with the so-called FAIR principles for scientific data management (findability, accessibility, interoperability and reusability). NFDI4Cat will also implement strict quality assurance methods, including prompts to follow best practices when ingesting data, to ensure that all data that enters the repository is high quality and properly labeled.

HLRS will develop and host the NFDI4Cat data repository

In addition to developing the conceptual framework for organizing catalysis research data, HLRS will also play a central role in developing the data hosting and sharing infrastructure.

NFDI4Cat will be based on a distributed repository infrastructure. (See figure above.) Data will be stored on a variety of servers, allowing institutions that want to share data with the community to participate, even if policies prevent them from storing it on an external server. A middleware layer at the heart of the service will connect the various repositories, and a graphical user interface will provide a portal for users to access data over the network.

To ensure that the catalysis community adopts the repository and its user interface, HLRS, in collaboration with FOKUS, will organize meetings with representatives from academia and industry to gather information on their needs. Discussions will focus on issues related to the state of the art of data management hardware and software technologies, how to best facilitate the integration of data and metadata into the system, and legal requirements in protection of intellectual property and access to data. Once the project requirements have been clarified, HLRS will work with FOKUS to develop the hardware and software backbone of the catalysis data repository.

In addition, HLRS will provide two server systems for the repository, along with approximately 100 TB of hard disk storage and up to 1 PB of tape back-end storage.

Networking to harness the power of data

In addition to maintaining a close dialogue with the catalysis research community, the NFDI4Cat will also engage with other NFDI centers on issues of common interest to ensure the data is useful in other areas. This includes centers focused on the development of national research data infrastructures for chemistry (NFDI4Chem), engineering (NFDI4Ing) and photon and neutron experiments (DAPHNE), which share research concerns that overlap. NFDI4Cat will also coordinate with other relevant research institutes, programs and data resources outside of the NFDI network to identify potential synergies.

Developing a more robust infrastructure for sharing and organizing data offers the short-term opportunity to turn research into catalysis. By moving to a more open data structure, scientists could gain new computing power to understand reaction mechanisms and kinetics, pursue more efficient and rational approaches to catalyst design, and gain new types of information from interdisciplinary research. In this way, NFDI4Cat should itself serve as a catalyst for new kinds of discoveries.

Source: Christopher Williams, HLRS

Sean N. Ayres