The search for scientific information usually starts with free internet search engines. Google Scholar, for example, searches through the scientific literature. Wikipedia, probably the most extensive encyclopedia, provides quick information and a good start into a new topic.
However, an ordinary search engine only detects a small part of the contents of the Internet. A complete literature review, therefore, requires additional search methods and skills.
The catchphrase “big data” is fairly new, although the concept is not. Collections of large and growing sets of chemical data have existed for years. What is new, however, is how much “bigger” that data is and how fast it is growing, and also how complicated the information is. Therefore, tools are needed to collect and analyze the data. And collecting and analyzing data is becoming increasingly valuable to improve processes and performance.
So, altogether good reasons to make yourself familiar with databases. Below, you can find a collection of some useful resources.
- BindingDB
Database of measured binding affinities for interactions between drug-like molecules and protein considered as drug-targets.
Powered by ChemAxon - BIOSIS Previews
CIndex to life sciences and biomedical research from journals, meetings, books, and patents.
Provided by EBSCO Industries - Cambridge Structural Database (CSD)
Repository for small-molecule organic and metal-organic crystal structures; contains the results of over half-a-million x-ray and neutron diffraction analyses.
Provided by The Cambridge Crystallographic Data Centre (CCDC), UK - ChEMBL
Open data resource of binding, functional, and ADMET bioactivity data.
Provided by the European Bioinformatic Institute (EMBL-EBI), UK - Chemical Entities of Biological Interest (ChEBI)
Focuses on small chemical compounds.
Provided by the European Bioinformatic Institute (EMBL-EBI), UK, as part of the Open Biomedical Ontologies (OBO Foundry) effort - Chemicalize.org
Identifies chemical structures on webpages and other text; some features free of charge.
Provided by ChemAxon, Hungary - ChemInform RxnFinder
Search engine for the ChemInform Reaction Library (CIRX), which contains over 2 million reactions and covers data from 1990 to the present from ca. 100 journals; a tool for the synthetic organic chemist.
Provided by Wiley - DOZNTM 2.0
Tool to determine the relative greenness of chemiclas and chmical processes against the 12 Principles of Green Chemistry
Provided byMerck - ChemPlanner (integrated with SciFindern)
Helps chemists design viable synthetic routes to their target molecules by predicting synthetic strategies and exposing a wide spectrum of relevant synthetic methods and available building blocks.
Provided in collaboration with Wiley - ChemSpider
Chemical structure database; access to over 43 million structures, properties, and associated information.
Provided by the Royal Society of Chemistry (RSC) - ChemSub Online
Privately organized database; information on chemical substances.
Provided by Robert Charles Knight and Aniruddha Warakoutikha - ChemSynthesis
Chemicals with synthesis references and physical properties.
Provided by the ChemSynthesis team - Computer Aided Material Preselection by Uniform Standards (CAMPUS)
Datasheets for resins from participating material producers; only database that offers comparable material information, restricted to uniform standards.
Provided by Chemie Wirtschaftsförderungs-GmbH, Germany - CrystalWorks
Wide range of crystallographic structure data made available by the Chemical Database Service.
Provided by the Science and Technology Facilities Council, UK - Derwent World Patent Index
Patent information.
Provided by Clarivate - DETHERM
Thermophysical data for pure substances and mixtures .
Provided by DECHEMA e.V., Germany - DrugBank
Bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information.
Provided by the Canadian Institutes of Health Research, Alberta Innovates – Health Solutions, and by the Metabolomics Innovation Centre (TMIC) - eChemPortal
Physical chemical properties and information on ecotoxicity, environmental fate and behavior, and toxicity.
Provided by the Organisation for Economic Co-operation and Development (OECD) - eMolecules
Supplier information of chemicals.
Provided by eMolecules, Inc. - Electronic Encyclopedia of Reagents for Organic Synthesis (eROS)
Information on reagents and catalysts for all chemists planning or working on organic syntheses.
Provided by Wiley - Genbank
An annotated collection of all publicly available DNA sequences.
Provided by the National Center for Biotechnology Information (NCBI) of the National Institutes of Health (NIH) - GitHub
A code hosting open-source platform for for software development/collaboration and version control.
Powered by Microsoft - GoogleScholar
Broadly search for scholarly literature.
Provided by Google - hmdb – The Human Metabolome Database
Detailed information about small molecule metabolites found in the human body.
Provided by The Metabolic Innovation Centre (TMIC), Canada - Inorganic Crystal Structure Database (ICSD)
Completely identified inorganic crystal structures; contains about 177,000 peer-reviewed data entries.
Provided by FIZ Karlsruhe – Leibniz-Institut für Informationsinfrastruktur GmbH, Germany - International Patent Documentation Center Database (INPADOC)
International patent collection; contains patent families and legal status information; updated weekly.
Provided by the European Patent Office (EPO) - IUPAC Standards Online
Database of IUPAC’s standards and recommendations, extracted from the journal Pure and Applied Chemistry (PAC).
Provided by Walter de Gruyter GmbH - KEGG: Kyoto Encyclopedia of Genes and Genomes
Database resource for understanding high-level functions and utilities of biological systems from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
Provided by Kanehisa Laboratories - Merck Index
Encyclopedia of chemicals, drugs, and biologicals with over 10,000 monographs on single substances or groups of related compounds; includes an appendix with monographs on organic named reactions.
Provided by the Royal Society of Chemistry (RSC) - MMsINC Search
Lists commercially available substances for virtual screening and chemoinformatic utilization.
Provided by the University of Padova, Italy - NFDI4Chem
The German National Research Data Infrastructure (NFDI) association aims to make data systematically accessible and sustainably available
Initiative to build an open and FAIR infrastructure for research data management in chemistry
Supported by the German Chemical Society (GDCh), German Bunsen Society for Physical Chemistry (DBG), and German Pharmaceutical Society (DPhG) and lead by the Applicant Institution Friedrich-Schiller-University Jena, Germany - NFDI4Cat
A community-driven and user-oriented initiative to secure the digital future of catalysis
Consists of 16 partners from the fields of homogeneous, heterogeneous, photocatalysis, biocatalysis, and electrocatalysis and is coordinated by DECHEMA e.V., Frankfurt, Germany - NIST Chemistry WebBook
Chemical and physical property data on over 40,000 compounds.
Provided by National Institute of Standards and Technology (NIST), USA - NMRShiftDB
Nuclear magnetic resonance (NMR) spectra of organic structures; allows for spectrum prediction (13C, 1H, and other nuclei) and for searching spectra, structures, and other properties.
Provided by the NMRShiftDB project - OpenFoodTox
access to summaries of toxicological information used by the European Food Safety Authority (EFSA) in its risk assessment; toxicity of chemicals found in the food and feed chain; updated on a yearly basis.
Provided by EFSA - OrgSyn
Detailed procedures for the synthesis of organic compounds.
Provided by Organic Syntheses, Inc. - PubChem
Information on the biological activities of small molecules.
Provided by the National Center for Biotechnology Information (NBI), USA. - PubMed
More than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books; citations may include links to full-text content from PubMed Central and publisher web sites.
Provided by The National Center for Biotechnology Information (NBI), US NAtional Library of Medicine, and National Institute of Health (NIH), USA - Reaxys
Data and citations from the Beilstein and Gmelin databases; includes chemical patents.
Provided by Elsevier - Science of Synthesis
Critical review of synthetic methodology developed from the early 1800s to today in the fields of organic and organometallic chemistry.
Provided by Thieme - SciFinder
Research discovery application that provides access to a comprehensive and authoritative source of references, substances, and reactions in chemistry and related sciences.
Provided by Chemical Abstracts Service (CAS) - Scopus
Abstract and citation database of peer-reviewed literature: scientific journals, books, and conference proceedings; features tools to track, analyze, and visualize research.
Provided by Elsevier - Spectral Databases
Spectral data system that uses empirical spectral data (over 2 million spectra) and advanced software to help chemists, toxicologists, and life scientists identify chemical substances.
Provided by Wiley - Spectral Database for Organic Compounds (SDBS)
Integrated spectral database system for organic compounds; includes six different types of spectra (EI-MS, 1H-NMR, 13C-NMR, FT-IR, Raman, ESR) for ca. 34.600 compounds.
Provided by National Institute of Advanced Industrial Science and Technology (AIST), Japan - Springer Materials
Numerical and graphical data on more than 3000 physical and chemical properties of over 250,000 materials and chemical systems.
Provided by Springer - TOXNET
Group of databases covering chemicals and drugs, diseases and the environment, environmental health, occupational safety and health, poisoning, risk assessment and regulations, and toxicology.
Provided by Toxicology and Environmental Health Information Program (TEHIP), USA - Web of Science
Connects publications and researchers through citations and controlled indexing in curated databases spanning all scientific disciplines.
Provided by Clarivate Analytics - World Wide Protein Data Bank (wwPDB)
Information about the 3D structures of proteins, nucleic acids, and complex assemblies.
Provided by Worldwide Protein Data Bank Foundation - ZINC
Collection of commercially-available chemical compounds prepared especially for virtual screening; contains over 35 million compounds in ready-to-dock 3D formats.
Provided by the Irwin and Shoichet Laboratories at the University of California, San Francisco, USA
Also useful may be FAIRsharing, which focuses on data standards for FAIR data, and many other services.
If you would like to see databases added to the list, please let us know at [email protected].
See also:
- How to Collaborate on Writing: Possibilities of working together on a text online and in real-time