Enzymes catalyze one or more specific reactions. Mapping the relationships between an enzyme and its possible substrates can be useful, e.g., in drug research. However, determining these relationships experimentally can be time-consuming and costly. Automated computational methods, for example, based on machine learning, could be useful in this context. However, existing machine-learning approaches used to predict the substrates of enzymes can be limited to small enzyme families that have good data for training the model, or they might only predict a general class of enzymes for a substrate instead of an individual one.
Martin J. Lercher, Heinrich Heine University, Düsseldorf, Germany, and colleagues have developed a general machine-learning model for the prediction of enzyme-substrate pairs, called ESP (“Enzyme Substrate Prediction”). The team developed a deep learning model in which information about enzymes and substrates that is important for the prediction is represented numerically. The representations of around 18,000 experimentally confirmed enzyme-substrate pairs were then used as inputs to train the model.
Once the training was complete, the researchers applied the model to a test data set with known enzyme-substrate pairs. The model correctly predicted which substrates match which enzymes with an accuracy of 91.5 %. Even for “unseen” enzymes whose amino acid sequence differs greatly from the training set, it achieved an accuracy of 89 %. However, it performs less well with small molecules that are not present in the training set, with accuracies as low as 71 %. The team created a web server that allows for easy use of the ESP model without requiring programming skills.
- A general model to predict small molecule substrates of enzymes based on machine and deep learning,
Alexander Kroll, Sahasra Ranjan, Martin K. M. Engqvist, Martin J. Lercher,
Nat. Commun. 2023.
https://doi.org/10.1038/s41467-023-38347-2