3 minute read


  • TinNet
  • The Alloy-Theoretic Automated Toolkit (ATAT)
  • PyTorch (v1.3.1)
  • Pymatgen
  • Python3
  • Numpy


Surface properties of materials are important for catalysis applications. Recent advances in computing power make density functional theory (DFT) to be possible to gain insights into the reactions happening on the catalyst surfaces. However, the surface environments are way more complex and adsorbates on surfaces can have many configurations. One possible way to solve this issue is by employing the cluster expansion to consider the adsorbate-adsorbate interactions.

Cluster expansion

Cluster expansion is developed based on the Ising model. With infinite terms of interactions considered, the exact energy of the system is obtained. The Hamiltonian describing the energy of the system can be espressed as

\begin{align} \begin{split} E = \pi * V \end{split} \end{align} \)

π is the correlation matrix which is obtained through ATAT package and V is the interaction energy which can be fitted by using DFT true values. Although it works well with pure metal surfaces[], the complexity increase dramatically when considering a alloy surface with various adsorbates. So a nonlinear CNN fitting model is employed to consider not only the ads-ads interactions on the surface layer, but the local environment of the metal layer, which helps improve the accuracy of predicting the formation energy of adsorbates.

DFT database pool

DFT database is provided by Bajpai et al. including pure Pt (111) with NO and O adsorbates.There are 648 images in total. The energy to train in this field is usually the formation energy per site, \(E_F(\sigma)\).

\begin{align} \begin{split} Pt + \frac{\theta_O}{2}O_2 + \theta_{NO}NO \rightarrow PtO_{\theta_O}NO_{\theta_{NO}} \> \> \> (\sigma) \end{split} \end{align} \)

\begin{align} \begin{split} E_F(\sigma) = \frac{E_0(\sigma)-E_0(0)}{N} - \frac{\theta_O}{2}E_{0,O_2} - \theta_{NO}E_{0,NO} \end{split} \end{align} \)

Preparing Input for CE-TinNet

ASE db file or any other file format. Containing the input images and energy to train

Image format is pretty flexible as long as it can be recognized by Voronoi. The energy

Preparing the dataset structure format

Pure Pt(111) with NO and O adsorbates

From LCNN there are preprocessed data samples to be used. In structure

python lcnn_to_str.py

Cu dopant in Pt(111) with NO and O adsorbates

In this work, for convenience the VASP POSCAR format is used since it can be easily converted to other formats.

python Grab_POS.py

Also there is a function Cvrt_Cart_Frac.py which can convert the coordinates from Cartisan to fractional (Direct) coordinate for the convenience of next step since ATAT use fractional coordinate.

Preparing the correlation matrix using the Alloy-Theoretic Automated Toolkit (ATAT)

CM.txt file containing the needed correlation matrix for each image.

Prepare the dataset into the str.in format which can be recognized by the ATAT package.

python POS_to_str.py

The format of str.in can be simply obtained from the ATAT manual.

Once all str.in format structures are ready, one can use the Str_to_arr.py to generate the needed CM.txt file. The specific code to implement is called corrdump which can read in the lat.in and str.in, and use criterion as the cut off to choose how many clusters to include.

Can choose to use clusters from literature or generate them from the ATAT package directly.

python Str_to_arr.py

corrdump requires several input options:

  • reads the lattice file (-l option)
  • find all symmetrically distinct clusters satisfying the conditions specified by two ways:
    • distance between nodes: -2 through -6 (-2=2.1 means pairs shorter than 2.1 units is selected.)
    • read in clusters directly from other sources: -c -cf=filename(usually clusters.out)
  • reads the structure file (-s option)

With all inputs, it determines the corresponding correlation factor for each cluster for this specific structure read in. The output is in one line with the same order as the clusters.out. While for a dataset with multiple images/strctures, multiple lines of correlation factors are generated and a m*n array can be generated where m is the number of images and n is the number of clusters included

Preparing the data images and energies

For convenience, the ASE db format is used to be the dataset which can have the images in various struture format and have the according energy for each image.

python Gen_formation_db.py

Formation energy per site which has been mentioned before is used as the training target.

For Cu1Pt(111) system, it was calculated by…

Training and Evaluating

python training.py

Hyperparameter optimization

ray tune is employed to tune the hyperparameters in the model.

Tianyou Mou

Tianyou Mou

Chemical Engineer and Data Scientist


  Write a comment ...