Quantitative structure–activity relationship models ( QSAR models) are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of "predictor" variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.
76-536: In QSAR modeling, the predictors consist of physico-chemical properties or theoretical molecular descriptors of chemicals; the QSAR response-variable could be a biological activity of the chemicals. QSAR models first summarize a supposed relationship between chemical structures and biological activity in a data-set of chemicals. Second, QSAR models predict the activities of new chemicals. Related terms include quantitative structure–property relationships ( QSPR ) when
152-482: A computational model that is used to describe the forces between atoms (or collections of atoms) within molecules or between molecules as well as in crystals. Force fields are a variety of interatomic potentials . More precisely, the force field refers to the functional form and parameter sets used to calculate the potential energy of a system on the atomistic level. Force fields are usually used in molecular dynamics or Monte Carlo simulations. The parameters for
228-400: A finite number of chemicals, so care must be taken to avoid overfitting : the generation of hypotheses that fit training data very closely but perform poorly when applied to new data. The SAR paradox refers to the fact that it is not the case that all similar molecules have similar activities . Analogously, the " partition coefficient "—a measurement of differential solubility and itself
304-523: A gradient of the potential energy with respect to the particle coordinates. A large number of different force field types exist today (e.g. for organic molecules , ions , polymers , minerals , and metals ). Depending on the material, different functional forms are usually chosen for the force fields since different types of atomistic interactions dominate the material behavior. There are various criteria that can be used for categorizing force field parametrization strategies. An important differentiation
380-405: A central embarrassment of molecular mechanics, namely that energy minimization or molecular dynamics generally leads to a model that is less like the experimental structure ". Force fields have been applied successfully for protein structure refinement in different X-ray crystallography and NMR spectroscopy applications, especially using program XPLOR. However, the refinement is driven mainly by
456-558: A chemical property is modeled as the response variable. "Different properties or behaviors of chemical molecules have been investigated in the field of QSPR. Some examples are quantitative structure–reactivity relationships (QSRRs), quantitative structure–chromatography relationships (QSCRs) and, quantitative structure–toxicity relationships (QSTRs), quantitative structure–electrochemistry relationships (QSERs), and quantitative structure– biodegradability relationships (QSBRs)." As an example, biological activity can be expressed quantitatively as
532-409: A chosen energy function may be derived from classical laboratory experiment data, calculations in quantum mechanics , or both. Force fields utilize the same concept as force fields in classical physics , with the main difference being that the force field parameters in chemistry describe the energy landscape on the atomistic level. From a force field, the acting forces on every particle are derived as
608-409: A combination of an energy similar to heat of fusion (energy absorbed during melting of molecular crystals), a conformational entropy contribution, and solvation free energy. The heat of fusion is significantly smaller than enthalpy of sublimation. Hence, the potentials describing protein folding or ligand binding need more consistent parameterization protocols, e.g., as described for IFF. Indeed,
684-503: A component of QSAR predictions—can be predicted either by atomic methods (known as "XLogP" or "ALogP") or by chemical fragment methods (known as "CLogP" and other variations). It has been shown that the logP of compound can be determined by the sum of its fragments; fragment-based methods are generally accepted as better predictors than atomic-based methods. Fragmentary values have been determined statistically, based on empirical data for known logP values. This method gives mixed results and
760-505: A comprehensive list of force fields. As it is rare for bonds to deviate significantly from their equilibrium values, the most simplistic approaches utilize a Hooke's law formula: E bond = k i j 2 ( l i j − l 0 , i j ) 2 , {\displaystyle E_{\text{bond}}={\frac {k_{ij}}{2}}(l_{ij}-l_{0,ij})^{2},} where k i j {\displaystyle k_{ij}}
836-557: A covalent bond at higher stretching is provided by the more expensive Morse potential . The functional form for dihedral energy is variable from one force field to another. Additional, "improper torsional" terms may be added to enforce the planarity of aromatic rings and other conjugated systems , and "cross-terms" that describe the coupling of different internal variables, such as angles and bond lengths. Some force fields also include explicit terms for hydrogen bonds . The nonbonded terms are computationally most intensive. A popular choice
SECTION 10
#1732909627666912-693: A finite amount of data is available (see also MVUE ). In general, all QSAR problems can be divided into coding and learning . (Q)SAR models have been used for risk management . QSARS are suggested by regulatory authorities; in the European Union , QSARs are suggested by the REACH regulation, where "REACH" abbreviates "Registration, Evaluation, Authorisation and Restriction of Chemicals". Regulatory application of QSAR methods includes in silico toxicological assessment of genotoxic impurities. Commonly used QSAR assessment software such as DEREK or CASE Ultra (MultiCASE)
988-477: A good quality QSAR model depends on many factors, such as the quality of input data, the choice of descriptors and statistical methods for modeling and for validation. Any QSAR modeling should ultimately lead to statistically robust and predictive models capable of making accurate and reliable predictions of the modeled response of new compounds. For validation of QSAR models, usually various strategies are adopted: The success of any QSAR model depends on accuracy of
1064-399: A high accuracy of the force field is the aim. For crystal systems with covalent bonding, bond order potentials are usually used, e.g. Tersoff potentials. For metal systems, usually embedded atom potentials are used. For metals, also so-called Drude model potentials have been developed, which describe a form of attachment of electrons to nuclei. In addition to the functional form of
1140-584: A human); by data mining; or by molecule mining. A typical data mining based prediction uses e.g. support vector machines , decision trees , artificial neural networks for inducing a predictive learning model. Molecule mining approaches, a special case of structured data mining approaches, apply a similarity matrix based prediction or an automatic fragmentation scheme into molecular substructures. Furthermore, there exist also approaches using maximum common subgraph searches or graph kernels . Typically QSAR models derived from non linear machine learning
1216-457: A molecule are computed and used to develop a QSAR. This approach is different from the fragment (or group contribution) approach in that the descriptors are computed for the system as whole rather than from the properties of individual fragments. This approach is different from the 3D-QSAR approach in that the descriptors are computed from scalar quantities (e.g., energies, geometric parameters) rather than from 3D fields. An example of this approach
1292-511: A preference for partial least squares (PLS) methods, since it applies the feature extraction and induction in one step. Computer SAR models typically calculate a relatively large number of features. Because those lack structural interpretation ability, the preprocessing steps face a feature selection problem (i.e., which structural features should be interpreted to determine the structure-activity relationship). Feature selection can be accomplished by visual inspection (qualitative selection by
1368-404: A set of experimental constraints and the interatomic potentials serve mainly to remove interatomic hindrances. The results of calculations were practically the same with rigid sphere potentials implemented in program DYANA (calculations from NMR data), or with programs for crystallographic refinement that use no energy functions at all. These shortcomings are related to interatomic potentials and to
1444-402: A useful number or the result of some standardized experiment. " By this definition, the molecular descriptors are divided into two main categories: experimental measurements , such as log P , molar refractivity , dipole moment , polarizability , and, in general, additive physico-chemical properties, and theoretical molecular descriptors , which are derived from a symbolic representation of
1520-418: A vacuum. A more general theory of van der Waals forces in condensed media was developed by A. D. McLachlan in 1963 and included the original London's approach as a special case. The McLachlan theory predicts that van der Waals attractions in media are weaker than in vacuum and follow the like dissolves like rule, which means that different types of atoms interact more weakly than identical types of atoms. This
1596-444: Is 'component-specific' and 'transferable'. For a component-specific parametrization, the considered force field is developed solely for describing a single given substance (e.g. water). For a transferable force field, all or some parameters are designed as building blocks and become transferable/ applicable for different substances (e.g. methyl groups in alkane transferable force fields). A different important differentiation addresses
SECTION 20
#17329096276661672-533: Is a clear trend in the increase of boiling point with an increase in the number carbons, and this serves as a means for predicting the boiling points of higher alkanes . A still very interesting application is the Hammett equation , Taft equation and pKa prediction methods. The biological activity of molecules is usually measured in assays to establish the level of inhibition of particular signal transduction or metabolic pathways . Drug discovery often involves
1748-678: Is a sum over all pairwise combinations of atoms and usually excludes 1, 2 bonded atoms, 1, 3 bonded atoms, as well as 1, 4 bonded atoms . Atomic charges can make dominant contributions to the potential energy, especially for polar molecules and ionic compounds, and are critical to simulate the geometry, interaction energy, and the reactivity. The assignment of charges usually uses some heuristic approach, with different possible solutions. Atomistic interactions in crystal systems significantly deviate from those in molecular systems, e.g. of organic molecules. For crystal systems, in particular multi-body interactions are important and cannot be neglected if
1824-419: Is also important. One of the first historical QSAR applications was to predict boiling points . It is well known for instance that within a particular family of chemical compounds , especially of organic chemistry , that there are strong correlations between structure and observed properties. A simple example is the relationship between the number of carbons in alkanes and their boiling points . There
1900-442: Is an emerging paradigm. In this context FB-QSAR proves to be a promising strategy for fragment library design and in fragment-to-lead identification endeavours. An advanced approach on fragment or group-based QSAR based on the concept of pharmacophore-similarity is developed. This method, pharmacophore-similarity-based QSAR (PS-QSAR) uses topological pharmacophoric descriptors to develop QSAR models. This activity prediction may assist
1976-528: Is assigned to each set corresponding to the activity of the molecule, which is assumed to be determined by at least one instance in the set (i.e. some conformation of the molecule). On June 18, 2011 the Comparative Molecular Field Analysis (CoMFA) patent has dropped any restriction on the use of GRID and partial least-squares (PLS) technologies. In this approach, descriptors quantifying various electronic, geometric, or steric properties of
2052-565: Is assumed as a minimal basic requirement for any descriptor. Two other important invariance properties, translational invariance and rotational invariance , are the invariance of a descriptor value to any translation or rotation of the molecules in the chosen reference frame. These last invariance properties are required for the 3D-descriptors. This property refers to the ability of a descriptor to avoid equal values for different molecules. In this sense, descriptors can show no degeneracy at all, low, intermediate, or high degeneracy. For example,
2128-475: Is at times differently defined or taken at different thermodynamic conditions. The bond stretching constant k i j {\displaystyle k_{ij}} can be determined from the experimental infrared spectrum, Raman spectrum, or high-level quantum-mechanical calculations. The constant k i j {\displaystyle k_{ij}} determines vibrational frequencies in molecular dynamics simulations. The stronger
2204-485: Is difficult to determine whether the selection of training and test sets was manipulated to maximize the predictive capacity of the model being published. Different aspects of validation of QSAR models that need attention include methods of selection of training set compounds, setting training set size and impact of variable selection for training set models for determining the quality of prediction. Development of novel validation parameters for judging quality of QSAR models
2280-638: Is generally not trusted to have accuracy of more than ±0.1 units. Group or fragment-based QSAR is also known as GQSAR. GQSAR allows flexibility to study various molecular fragments of interest in relation to the variation in biological response. The molecular fragments could be substituents at various substitution sites in congeneric set of molecules or could be on the basis of pre-defined chemical rules in case of non-congeneric sets. GQSAR also considers cross-terms fragment descriptors, which could be helpful in identification of key fragment interactions in determining variation of activity. Lead discovery using fragnomics
2356-469: Is in contrast to combinatorial rules or Slater-Kirkwood equation applied for development of the classical force fields. The combinatorial rules state that the interaction energy of two dissimilar atoms (e.g., C...N) is an average of the interaction energies of corresponding identical atom pairs (i.e., C...C and N...N). According to McLachlan's theory, the interactions of particles in media can even be fully repulsive, as observed for liquid helium , however,
Quantitative structure–activity relationship - Misplaced Pages Continue
2432-493: Is likely to increase inconsistencies at the level of atomic charges, for the assignment of remaining parameters, and likely to dilute the interpretability and performance of parameters. A large number of force fields has been published in the past decades - mostly in scientific publications. In recent years, some databases have attempted to collect, categorize and make force fields digitally available. Therein, different databases, focus on different types of force fields. For example,
2508-851: Is seen as a "black box", which fails to guide medicinal chemists. Recently there is a relatively new concept of matched molecular pair analysis or prediction driven MMPA which is coupled with QSAR model in order to identify activity cliffs. QSAR modeling produces predictive models derived from application of statistical tools correlating biological activity (including desirable therapeutic effect and undesirable side effects) or physico-chemical properties in QSPR models of chemicals (drugs/toxicants/environmental pollutants) with descriptors representative of molecular structure or properties . QSARs are being applied in many disciplines, for example: risk assessment , toxicity prediction, and regulatory decisions in addition to drug discovery and lead optimization . Obtaining
2584-588: Is the Coulomb law : E Coulomb = 1 4 π ε 0 q i q j r i j , {\displaystyle E_{\text{Coulomb}}={\frac {1}{4\pi \varepsilon _{0}}}{\frac {q_{i}q_{j}}{r_{ij}}},} where r i j {\displaystyle r_{ij}} is the distance between two atoms i {\displaystyle i} and j {\displaystyle j} . The total Coulomb energy
2660-519: Is the QSARs developed for olefin polymerization by half sandwich compounds . It has been shown that activity prediction is even possible based purely on the SMILES string. Similarly to string-based methods, the molecular graph can directly be used as input for QSAR models, but usually yield inferior performance compared to descriptor-based QSAR models. In the literature it can be often found that chemists have
2736-475: Is the force constant, l i j {\displaystyle l_{ij}} is the bond length, and l 0 , i j {\displaystyle l_{0,ij}} is the value for the bond length between atoms i {\displaystyle i} and j {\displaystyle j} when all other terms in the force field are set to 0. The term l 0 , i j {\displaystyle l_{0,ij}}
2812-411: Is then usually reduced by a following feature extraction (see also dimensionality reduction ). The following learning method can be any of the already mentioned machine learning methods, e.g. support vector machines . An alternative approach uses multiple-instance learning by encoding molecules as sets of data instances, each of which represents a possible molecular conformation. A label or response
2888-412: Is therefore how to define a small difference on a molecular level, since each kind of activity, e.g. reaction ability, biotransformation ability, solubility , target activity, and so on, might depend on another difference. Examples were given in the bioisosterism reviews by Patanie/LaVoie and Brown. In general, one is more interested in finding strong trends . Created hypotheses usually rely on
2964-569: Is to limit interactions to pairwise energies. The van der Waals term is usually computed with a Lennard-Jones potential or the Mie potential and the electrostatic term with Coulomb's law . However, both can be buffered or scaled by a constant factor to account for electronic polarizability . A large number of force fields based on this or similar energy expressions have been proposed in the past decades for modeling different types of materials such as molecular substances, metals, glasses etc. - see below for
3040-436: Is used to genotoxicity of impurity according to ICH M7. The chemical descriptor space whose convex hull is generated by a particular training set of chemicals is called the training set's applicability domain . Prediction of properties of novel chemicals that are located outside the applicability domain uses extrapolation , and so is less reliable (on average) than prediction within the applicability domain. The assessment of
3116-472: The Lennard-Jones potential , rather than experimental constants and is concerned with the overall molecule rather than a single substituent. The first 3-D QSAR was named Comparative Molecular Field Analysis (CoMFA) by Cramer et al. It examined the steric fields (shape of the molecule) and the electrostatic fields which were correlated by means of partial least squares regression (PLS). The created data space
Quantitative structure–activity relationship - Misplaced Pages Continue
3192-414: The like dissolves like rule, as predicted by McLachlan theory. Different force fields are designed for different purposes: Several force fields explicitly capture polarizability , where a particle's effective charge can be influenced by electrostatic interactions with its neighbors. Core-shell models are common, which consist of a positively charged core particle, representing the polarizable atom, and
3268-460: The openKim database focuses on interatomic functions describing the individual interactions between specific elements. The TraPPE database focuses on transferable force fields of organic molecules (developed by the Siepmann group). The MolMod database focuses on molecular and ionic force fields (both component-specific and transferable). Functional forms and parameter sets have been defined by
3344-616: The bond is between atoms, the higher is the value of the force constant, and the higher the wavenumber (energy) in the IR/Raman spectrum. Though the formula of Hooke's law provides a reasonable level of accuracy at bond lengths near the equilibrium distance, it is less accurate as one moves away. In order to model the Morse curve better one could employ cubic and higher powers. However, for most practical applications these differences are negligible, and inaccuracies in predictions of bond lengths are on
3420-421: The concentration of a substance required to give a certain biological response. Additionally, when physicochemical properties or structures are expressed by numbers, one can find a mathematical relationship, or quantitative structure-activity relationship, between the two. The mathematical expression, if carefully validated, can then be used to predict the modeled response of other chemical structures. A QSAR has
3496-676: The condensed phase relative to the gas phase and reproduced once the parameters for all phases are validated to reproduce chemical bonding, density, and cohesive/surface energy. Limitations have been strongly felt in protein structure refinement. The major underlying challenge is the huge conformation space of polymeric molecules, which grows beyond current computational feasibility when containing more than ~20 monomers. Participants in Critical Assessment of protein Structure Prediction (CASP) did not try to refine their models to avoid "
3572-521: The contrary, would require many additional assumptions and may not be possible. In many cases, force fields can be straight forwardly combined. Yet, often, additional specifications and assumptions are required. All interatomic potentials are based on approximations and experimental data, therefore often termed empirical . The performance varies from higher accuracy than density functional theory (DFT) calculations, with access to million times larger systems and time scales, to random guesses depending on
3648-546: The contribution of certain pharmacophore features encoded by respective fragments toward activity improvement and/or detrimental effects. The acronym 3D-QSAR or 3-D QSAR refers to the application of force field calculations requiring three-dimensional structures of a given set of small molecules with known activities (training set). The training set needs to be superimposed (aligned) by either experimental data (e.g. based on ligand-protein crystallography ) or molecule superimposition software. It uses computed potentials, e.g.
3724-643: The covalent and noncovalent contributions are given by the following summations: E bonded = E bond + E angle + E dihedral {\displaystyle E_{\text{bonded}}=E_{\text{bond}}+E_{\text{angle}}+E_{\text{dihedral}}} E nonbonded = E electrostatic + E van der Waals {\displaystyle E_{\text{nonbonded}}=E_{\text{electrostatic}}+E_{\text{van der Waals}}} The bond and angle terms are usually modeled by quadratic energy functions that do not allow bond breaking. A more realistic description of
3800-576: The developers of interatomic potentials and feature variable degrees of self-consistency and transferability. When functional forms of the potential terms vary or are mixed, the parameters from one interatomic potential function can typically not be used together with another interatomic potential function. In some cases, modifications can be made with minor effort, for example, between 9-6 Lennard-Jones potentials to 12-6 Lennard-Jones potentials. Transfers from Buckingham potentials to harmonic potentials, or from Embedded Atom Models to harmonic potentials, on
3876-536: The effective spring constant for each potential. Heuristic force field parametrization procedures have been very successfully for many year, but recently criticized. since they are usually not fully automated and therefore subject to some subjectivity of the developers, which also brings problems regarding the reproducibility of the parametrization procedure. Efforts to provide open source codes and methods include openMM and openMD . The use of semi-automation or full automation, without input from chemical knowledge,
SECTION 50
#17329096276663952-457: The electrostatic potential around molecules, which works less well for anisotropic charge distributions. The remedy is that point charges have a clear interpretation and virtual electrons can be added to capture essential features of the electronic structure, such additional polarizability in metallic systems to describe the image potential, internal multipole moments in π-conjugated systems, and lone pairs in water. Electronic polarization of
4028-512: The energies of H-bonds in proteins are ~ -1.5 kcal/mol when estimated from protein engineering or alpha helix to coil transition data, but the same energies estimated from sublimation enthalpy of molecular crystals were -4 to -6 kcal/mol, which is related to re-forming existing hydrogen bonds and not forming hydrogen bonds from scratch. The depths of modified Lennard-Jones potentials derived from protein engineering data were also smaller than in typical potential parameters and followed
4104-572: The environment may be better included by using polarizable force fields or using a macroscopic dielectric constant . However, application of one value of dielectric constant is a coarse approximation in the highly heterogeneous environments of proteins, biological membranes, minerals, or electrolytes. All types of van der Waals forces are also strongly environment-dependent because these forces originate from interactions of induced and "instantaneous" dipoles (see Intermolecular force ). The original Fritz London theory of these forces applies only in
4180-410: The force field. The use of accurate representations of chemical bonding, combined with reproducible experimental data and validation, can lead to lasting interatomic potentials of high quality with much fewer parameters and assumptions in comparison to DFT-level quantum methods. Possible limitations include atomic charges, also called point charges. Most force fields rely on point charges to reproduce
4256-424: The form of a mathematical model : The error includes model error ( bias ) and observational variability, that is, the variability in observations even on a correct model. The principal steps of QSAR/QSPR include: The basic assumption for all molecule-based hypotheses is that similar molecules have similar activities. This principle is also called Structure–Activity Relationship ( SAR ). The underlying problem
4332-448: The gas phase are used for parametrizing intramolecular interactions and parametrizing intermolecular dispersive interactions by using macroscopic properties such as liquid densities. The assignment of atomic charges often follows quantum mechanical protocols with some heuristics, which can lead to significant deviation in representing specific properties. A large number of workflows and parametrization procedures have been employed in
4408-468: The inability to sample the conformation space of large molecules effectively. Thereby also the development of parameters to tackle such large-scale problems requires new approaches. A specific problem area is homology modeling of proteins. Meanwhile, alternative empirical scoring functions have been developed for ligand docking , protein folding , homology model refinement, computational protein design , and modeling of proteins in membranes. It
4484-573: The input data, selection of appropriate descriptors and statistical tools, and most importantly validation of the developed model. Validation is the process by which the reliability and relevance of a procedure are established for a specific purpose; for QSAR models validation must be mainly for robustness, prediction performances and applicability domain (AD) of the models. Some validation methodologies can be problematic. For example, leave one-out cross-validation generally leads to an overestimation of predictive capacity. Even with external validation, it
4560-434: The interactions of a family of molecules with an enzyme or receptor binding site, QSAR can also be used to study the interactions between the structural domains of proteins. Protein-protein interactions can be quantitatively analyzed for structural variations resulted from site-directed mutagenesis . It is part of the machine learning method to reduce the risk for a SAR paradox, especially taking into account that only
4636-460: The lack of vaporization and presence of a freezing point contradicts a theory of purely repulsive interactions. Measurements of attractive forces between different materials ( Hamaker constant ) have been explained by Jacob Israelachvili . For example, " the interaction between hydrocarbons across water is about 10% of that across vacuum ". Such effects are represented in molecular dynamics through pairwise interactions that are spatially more dense in
SECTION 60
#17329096276664712-410: The long-range electrostatic and van der Waals forces . The specific decomposition of the terms depends on the force field, but a general form for the total energy in an additive force field can be written as E total = E bonded + E nonbonded {\displaystyle E_{\text{total}}=E_{\text{bonded}}+E_{\text{nonbonded}}} where the components of
4788-782: The molecule and can be further classified according to the different types of molecular representation. The main classes of theoretical molecular descriptors are: 1) 0D-descriptors (i.e. constitutional descriptors, count descriptors), 2) 1D-descriptors (i.e. list of structural fragments, fingerprints),3) 2D-descriptors (i.e. graph invariants),4) 3D-descriptors (such as, for example, 3D-MoRSE descriptors, WHIM descriptors, GETAWAY descriptors, quantum-chemical descriptors, size, steric, surface and volume descriptors),5) 4D-descriptors (such as those derived from GRID or CoMFA methods, Volsurf). The outspread of artificial intelligence and machine learning to computational chemistry has also lead to various attempts to uncover new descriptors or to find
4864-432: The most predictive ones among some sort of candidates. The invariance properties of molecular descriptors can be defined as the ability of the algorithm for their calculation to give a descriptor value that is independent of the particular characteristics of the molecular representation, such as atom numbering or labeling, spatial reference frame, molecular conformations, etc. Invariance to molecular numbering or labeling
4940-399: The number of molecule atoms and the molecular weights are high degeneracy descriptors, while, usually, 3D-descriptors show low or no degeneracy at all. Here there is a list of a selection of commercial and free descriptor calculation tools. Force field (chemistry) In the context of chemistry , molecular physics , physical chemistry , and molecular modelling , a force field is
5016-603: The order of the thousandth of an angstrom, which is also the limit of reliability for common force fields. A Morse potential can be employed instead to enable bond breaking and higher accuracy, even though it is less efficient to compute. For reactive force fields, bond breaking and bond orders are additionally considered. Electrostatic interactions are represented by a Coulomb energy, which utilizes atomic charges q i {\displaystyle q_{i}} to represent chemical bonding ranging from covalent to polar covalent and ionic bonding . The typical formula
5092-401: The parametrization, either using data/ information from the atomistic level, e.g. from quantum mechanical calculations or spectroscopic data, or using data from macroscopic properties, e.g. the hardness or compressibility of a given material. Often a combination of these routes is used. Hence, one way or the other, the force field parameters are always determined in an empirical way. Nevertheless,
5168-532: The past decades using different data and optimization strategies for determining the force field parameters. They differ significantly, which is also due to different focuses of different developments. The parameters for molecular simulations of biological macromolecules such as proteins , DNA , and RNA were often derived/ transferred from observations for small organic molecules , which are more accessible for experimental studies and quantum calculations. Atom types are defined for different elements as well as for
5244-782: The physical structure of the models: A ll-atom force fields provide parameters for every type of atom in a system, including hydrogen , while united-atom interatomic potentials treat the hydrogen and carbon atoms in methyl groups and methylene bridges as one interaction center. Coarse-grained potentials, which are often used in long-time simulations of macromolecules such as proteins , nucleic acids , and multi-component complexes, sacrifice chemical details for higher computing efficiency. The basic functional form of potential energy for modeling molecular systems includes intramolecular interaction terms for interactions of atoms that are linked by covalent bonds and intermolecular (i.e. nonbonded also termed noncovalent ) terms that describe
5320-545: The potentials, a force fields consists of the parameters of these functions. Together, they specify the interactions on the atomistic level. The parametrization, i.e. determining of the parameter values, is crucial for the accuracy and reliability of the force field. Different parametrization procedures have been developed for the parametrization of different substances, e.g. metals, ions, and molecules. For different material types, usually different parametrization strategies are used. In general, two main types can be distinguished for
5396-449: The reliability of QSAR predictions remains a research topic. The QSAR equations can be used to predict biological activities of newer molecules before their synthesis. Examples of machine learning tools for QSAR modeling include: Molecular descriptor Molecular descriptors play a fundamental role in chemistry, pharmaceutical sciences, environmental protection policy , and health researches, as well as in quality control, being
5472-539: The same elements in sufficiently different chemical environments. For example, oxygen atoms in water and an oxygen atoms in a carbonyl functional group are classified as different force field types. Typical molecular force field parameter sets include values for atomic mass , atomic charge , Lennard-Jones parameters for every atom type, as well as equilibrium values of bond lengths , bond angles , and dihedral angles . The bonded terms refer to pairs, triplets, and quadruplets of bonded atoms, and include values for
5548-441: The term 'empirical' is often used in the context of force field parameters when macroscopic material property data was used for the fitting. Experimental data (microscopic and macroscopic) included for the fit, for example, the enthalpy of vaporization , enthalpy of sublimation , dipole moments , and various spectroscopic properties such as vibrational frequencies. Often, for molecular systems, quantum mechanical calculations in
5624-415: The use of QSAR to identify chemical structures that could have good inhibitory effects on specific targets and have low toxicity (non-specific activity). Of special interest is the prediction of partition coefficient log P , which is an important measure used in identifying " druglikeness " according to Lipinski's Rule of Five . While many quantitative structure activity relationship analyses involve
5700-400: The way molecules, thought of as real bodies, are transformed into numbers, allowing some mathematical treatment of the chemical information contained in the molecule. This was defined by Todeschini and Consonni as: " The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into
5776-564: Was also argued that some protein force fields operate with energies that are irrelevant to protein folding or ligand binding. The parameters of proteins force fields reproduce the enthalpy of sublimation , i.e., energy of evaporation of molecular crystals. However, protein folding and ligand binding are thermodynamically closer to crystallization , or liquid-solid transitions as these processes represent freezing of mobile molecules in condensed media. Thus, free energy changes during protein folding or ligand binding are expected to represent
#665334