Can Large Language Models Predict Inorganic Syntheses?

Can Large Language Models Predict Inorganic Syntheses?

Author: ChemistryViews

General-purpose large language models (LLMs), such as those used in ChatGPT, can be useful in different fields, e.g., writing, coding, or scientific research. LLMs have been used, for example, for data extraction or synthesis planning in organic chemistry. While developing entirely new chemistry- or subfield-specific LLMs could be useful, it also requires considerable effort and large amounts of training data. As an alternative, general models can be fine-tuned using relatively small amounts of chemical data to improve their performance for applications in chemistry.

Yousung Jung, Seoul National University, Republic of Korea, Joshua Schrier, Fordham University, New York, USA, and colleagues have investigated whether such fine-tuned models can be effective for planning syntheses in inorganic chemistry, predicting whether a compound can be synthesized and selecting suitable precursors if it can. The team compared the commonly used GPT-3.5 and GPT-4 models with existing specialized machine-learning models. The LLMs were fine-tuned using existing data from databases such as the Inorganic Crystal Structure Database (ICSD) for training.

The team found that fine-tuned LLMs can perform similarly well or even better than current specialized machine learning models for the two tasks they looked at, i.e., predicting synthesizability and precursors from an inorganic chemical formula. Fine-tuning can be easier and less costly than developing entirely new models, and the team suggests that such fine-tuned models could be useful as a baseline for future specialized ones. However, they caution that the way LLM training relies on statistical patterns in the existing data may hamper the extrapolation to entirely new or rare reactions.


 

Leave a Reply

Kindly review our community guidelines before leaving a comment.

Your email address will not be published. Required fields are marked *