Training a Topological Surrogate Energy Model ============================================= In this document, we will walk through the steps to train a surrogate energy model for nanoparticles using Topological Descriptors. Here, we import essential modules that will allow us to create nanoparticles, compute their energies, extract features, and perform Bayesian Ridge Regression. .. code-block:: python from npl.core import Nanoparticle from npl.calculators import EMTCalculator from npl.descriptors.global_feature_classifier import testTopologicalFeatureClassifier from npl.calculators import BayesianRRCalculator from npl.utils.utils import plot_learning_curves import numpy as np import matplotlib.pyplot as plt import pickle This function initializes an EMTCalculator and creates a list of nanoparticles based on the given parameters. Each nanoparticle's energy is computed and stored in the training set. .. code-block:: python def create_octahedron_training_set(n_particles, height, trunc, stoichiometry): emt_calculator = EMTCalculator(fmax=0.2, steps=1000) training_set = [] for i in range(n_particles): # Initialize a new Nanoparticle instance p = Nanoparticle() # Create a truncated octahedron nanoparticle with specified parameters p.truncated_octahedron(height, trunc, stoichiometry) # Compute the energy of the nanoparticle using the EMT calculator emt_calculator.compute_energy(p) # Add the nanoparticle to the training set training_set.append(p) return training_set Here, we define the stoichiometry for our nanoparticles and generate a training set containing 40 nanoparticles. .. code-block:: python stoichiometry = {'Pt': 55, 'Au': 24} training_set = create_octahedron_training_set(40, 5, 1, stoichiometry) We initialize the TopologicalFeatureClassifier to compute the topological features for each nanoparticle in the training set. .. code-block:: python classifier = testTopologicalFeatureClassifier(list(stoichiometry.keys())) for p in training_set: classifier.compute_feature_vector(p) An instance of the Bayesian Ridge Regression calculator is created, and we fit the model using the training set with 10% of the data reserved for validation. .. code-block:: python calculator = BayesianRRCalculator(classifier.get_feature_key()) calculator.fit(training_set, 'EMT', validation_set=0.1) This figure illustrates the learning curve for the model, depicting the training and test performance as the training set size increases. .. code-block:: python X = [p.get_feature_vector('TFC') for p in training_set] y = [p.get_energy('EMT') for p in training_set] n_atoms = training_set[0].get_n_atoms() plot_learning_curves(X, y, n_atoms, calculator.ridge, n_splits=10, train_sizes=range(4, 30, 2), y_lim=(0, 2)) .. figure:: ../images/learning_curve.png :alt: Learning curve showing model performance across training sizes. :align: center :figwidth: 100% We plot the coefficients values to visualize the importance of each feature in the model. .. code-block:: python coefficients = calculator.get_coefficients() feature_names = classifier.get_feature_labels() plt.figure(figsize=(10, 6)) plt.bar(range(len(coefficients)), coefficients) plt.hlines(0, 0, len(coefficients), linestyles='dashed') plt.xticks(range(len(coefficients)), feature_names, rotation=90) plt.xlabel('Coefficient Index') plt.ylabel('Coefficient Value') plt.title('Fitting Coefficients') plt.show() .. figure:: ../images/coefficients.png :alt: Description of the image :align: center :figwidth: 100% Finally, we save the trained model to a file for future use, ensuring that we can reuse it without retraining. .. code-block:: python calculator.save('bayesian_rr_calculator.pkl')