In information science , an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse . More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology .
76-541: BioNumerics is a bioinformatics desktop software application that manages microbiological data. It is developed by Applied Maths NV , a bioMérieux company. BioNumerics was first released in 1998. PulseNet , a network run by the Centers for Disease Control and Prevention (CDC), uses BioNumerics to compare pulsed field gel electrophoresis (PFGE) patterns and whole genome sequences from different bacterial strains. CaliciNet, an outbreak surveillance network for noroviruses,
152-402: A common upper ontology is a largely manual process and therefore time-consuming and expensive. Domain ontologies that use the same upper ontology to provide a set of basic elements with which to specify the meanings of the domain ontology entities can be merged with less effort. There are studies on generalized techniques for merging ontologies, but this area of research is still ongoing, and it
228-428: A comprehensive picture of these activities. Therefore , the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data. This also includes nucleotide and amino acid sequences , protein domains , and protein structures . Important sub-disciplines within bioinformatics and computational biology include: The primary goal of bioinformatics
304-445: A critical area of bioinformatics research. In genomics , annotation refers to the process of marking the stop and start regions of genes and other biological features in a sequenced DNA sequence. Many genomes are too large to be annotated by hand. As the rate of sequencing exceeds the rate of genome annotation, genome annotation has become the new bottleneck in bioinformatics . Genome annotation can be classified into three levels:
380-954: A field parallel to biochemistry (the study of chemical processes in biological systems). Bioinformatics and computational biology involved the analysis of biological data, particularly DNA, RNA, and protein sequences. The field of bioinformatics experienced explosive growth starting in the mid-1990s, driven largely by the Human Genome Project and by rapid advances in DNA sequencing technology. Analyzing biological data to produce meaningful information involves writing and running software programs that use algorithms from graph theory , artificial intelligence , soft computing , data mining , image processing , and computer simulation . The algorithms in turn depend on theoretical foundations such as discrete mathematics , control theory , system theory , information theory , and statistics . There has been
456-536: A linguistic tool for learning domain ontologies. The Gellish ontology is an example of a combination of an upper and a domain ontology. A survey of ontology visualization methods is presented by Katifori et al. An updated survey of ontology visualization methods and tools was published by Dudás et al. The most established ontology visualization methods, namely indented tree and graph visualization are evaluated by Fu et al. A visual language for ontologies represented in OWL
532-399: A particular population of cancer cells. Protein microarrays and high throughput (HT) mass spectrometry (MS) can provide a snapshot of the proteins present in a biological sample. The former approach faces similar problems as with microarrays targeted at mRNA, the latter involves the problem of matching large amounts of mass data against predicted masses from protein sequence databases, and
608-407: A pioneer in the field, compiled one of the first protein sequence databases, initially published as books as well as methods of sequence alignment and molecular evolution . Another early contributor to bioinformatics was Elvin A. Kabat , who pioneered biological sequence analysis in 1970 with his comprehensive volumes of antibody sequences released online with Tai Te Wu between 1980 and 1991. In
684-502: A preface to the proceedings. Some researchers, drawing inspiration from philosophical ontologies, viewed computational ontology as a kind of applied philosophy. In 1993, the widely cited web page and paper "Toward Principles for the Design of Ontologies Used for Knowledge Sharing" by Tom Gruber used ontology as a technical term in computer science closely related to earlier idea of semantic networks and taxonomies . Gruber introduced
760-460: A protein in its native environment. An exception is the misfolded protein involved in bovine spongiform encephalopathy . This structure is linked to the function of the protein. Additional structural information includes the secondary , tertiary and quaternary structure. A viable general solution to the prediction of the function of a protein remains an open problem. Most efforts have so far been directed towards heuristics that work most of
836-415: A range of fields, including biomedical informatics, industry. Such efforts often use ontology editing tools such as Protégé . Ontology is a branch of philosophy and intersects areas such as metaphysics , epistemology , and philosophy of language , as it considers how knowledge, language, and perception relate to the nature of reality. Metaphysics deals with questions like "what exists?" and "what
SECTION 10
#1732901548464912-551: A realm of the world, such as biology or politics. Each domain ontology typically models domain-specific definitions of terms. For example, the word card has many different meanings. An ontology about the domain of poker would model the " playing card " meaning of the word, while an ontology about the domain of computer hardware would model the " punched card " and " video card " meanings. Since domain ontologies are written by different people, they represent concepts in very specific and unique ways, and are often incompatible within
988-482: A spectrum of algorithmic, statistical and mathematical techniques, ranging from exact, heuristics , fixed parameter and approximation algorithms for problems based on parsimony models to Markov chain Monte Carlo algorithms for Bayesian analysis of problems based on probabilistic models. Many of these studies are based on the detection of sequence homology to assign sequences to protein families . Pan genomics
1064-501: A theory of a modeled world and a component of knowledge-based systems . In particular, David Powers introduced the word ontology to AI to refer to real world or robotic grounding, publishing in 1990 literature reviews emphasizing grounded ontology in association with the call for papers for a AAAI Summer Symposium Machine Learning of Natural Language and Ontology, with an expanded version published in SIGART Bulletin and included as
1140-560: A tremendous advance in speed and cost reduction since the completion of the Human Genome Project, with some labs able to sequence over 100,000 billion bases each year, and a full genome can be sequenced for $ 1,000 or less. Computers became essential in molecular biology when protein sequences became available after Frederick Sanger determined the sequence of insulin in the early 1950s. Comparing multiple sequences manually turned out to be impractical. Margaret Oakley Dayhoff ,
1216-413: A tremendous amount of information related to molecular biology. Bioinformatics is the name given to these mathematical and computing approaches used to glean understanding of biological processes. Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning DNA and protein sequences to compare them, and creating and viewing 3-D models of protein structures. Since
1292-488: Is a collaborative data collection of the functional elements of the human genome that uses next-generation DNA-sequencing technologies and genomic tiling arrays, technologies able to automatically generate large amounts of data at a dramatically reduced per-base cost but with the same accuracy (base call error) and fidelity (assembly error). While genome annotation is primarily based on sequence similarity (and thus homology ), other properties of sequences can be used to predict
1368-473: Is a concept introduced in 2005 by Tettelin and Medini. Pan genome is the complete gene repertoire of a particular monophyletic taxonomic group. Although initially applied to closely related strains of a species, it can be applied to a larger context like genus, phylum, etc. It is divided in two parts: the Core genome, a set of genes common to all the genomes under study (often housekeeping genes vital for survival), and
1444-456: Is a formal, explicit specification of a shared conceptualization that is characterized by high semantic expressiveness required for increased complexity." Contemporary ontologies share many structural similarities, regardless of the language in which they are expressed. Most ontologies describe individuals (instances), classes (concepts), attributes and relations. A domain ontology (or domain-specific ontology) represents concepts which belong to
1520-735: Is a recent event to see the issue sidestepped by having multiple domain ontologies using the same upper ontology like the OBO Foundry . An upper ontology (or foundation ontology) is a model of the commonly shared relations and objects that are generally applicable across a wide range of domain ontologies. It usually employs a core glossary that overarches the terms and associated object descriptions as they are used in various relevant domain ontologies. Standardized upper ontologies available for use include BFO , BORO method , Dublin Core , GFO , Cyc , SUMO , UMBEL , and DOLCE . WordNet has been considered an upper ontology by some and has been used as
1596-515: Is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology , chemistry , physics , computer science , computer programming , information engineering , mathematics and statistics to analyze and interpret biological data . The process of analyzing and interpreting data can sometimes be referred to as computational biology , however this distinction between
SECTION 20
#17329015484641672-403: Is an open competition where worldwide research groups submit protein models for evaluating unknown protein models. The linear amino acid sequence of a protein is called the primary structure . The primary structure can be easily determined from the sequence of codons on the DNA gene that codes for it. In most proteins, the primary structure uniquely determines the 3-dimensional structure of
1748-484: Is another example of a network which uses BioNumerics to submit norovirus sequences and basic epidemiologic information to a central database. The basis of BioNumerics is a database consisting of entries. The entries correspond to the individual organisms or samples under study and are characterized by a unique key and by a number of user-defined information fields. Each entry in a database may be characterized by one or more experiments that can be linked easily to
1824-582: Is called protein function prediction . For instance, if a protein is found in the nucleus it may be involved in gene regulation or splicing . By contrast, if a protein is found in mitochondria , it may be involved in respiration or other metabolic processes . There are well developed protein subcellular localization prediction resources available, including protein subcellular location databases, and prediction tools. Data from high-throughput chromosome conformation capture experiments, such as Hi-C (experiment) and ChIA-PET , can provide information on
1900-431: Is common for ontology editors to use one or more ontology languages . Aspects of ontology editors include: visual navigation possibilities within the knowledge model , inference engines and information extraction ; support for modules; the import and export of foreign knowledge representation languages for ontology matching ; and the support of meta-ontologies such as OWL-S , Dublin Core , etc. Ontology learning
1976-653: Is considered by some as a successor to prior work in philosophy. However many current efforts are more concerned with establishing controlled vocabularies of narrow domains than with philosophical first principles , or with questions such as the mode of existence of fixed essences or whether enduring objects (e.g., perdurantism and endurantism ) may be ontologically more primary than processes . Artificial intelligence has retained considerable attention regarding applied ontology in subfields like natural language processing within machine translation and knowledge representation , but ontology editors are being used often in
2052-560: Is often found to contain considerable variability, or noise , and thus Hidden Markov model and change-point analysis methods are being developed to infer real copy number changes. Two important principles can be used to identify cancer by mutations in the exome . First, cancer is a disease of accumulated somatic mutations in genes. Second, cancer contains driver mutations which need to be distinguished from passengers. Further improvements in bioinformatics could allow for classifying types of cancer by analysis of cancer driven mutations in
2128-528: Is specified by the Visual Notation for OWL Ontologies (VOWL) . Ontology engineering (also called ontology building) is a set of tasks related to the development of ontologies for a particular domain. It is a subfield of knowledge engineering that studies the ontology development process, the ontology life cycle, the methods and methodologies for building ontologies, and the tools and languages that support them. Ontology engineering aims to make explicit
2204-530: Is the attempt to represent entities, including both objects and events, with all their interdependent properties and relations, according to a system of categories. In both fields, there is considerable work on problems of ontology engineering (e.g., Quine and Kripke in philosophy, Sowa and Guarino in information science), and debates concerning to what extent normative ontology is possible (e.g., foundationalism and coherentism in philosophy, BFO and Cyc in artificial intelligence). Applied ontology
2280-584: Is the automatic or semi-automatic creation of ontologies, including extracting a domain's terms from natural language text. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process. Information extraction and text mining have been explored to automatically link ontologies to documents, for example in the context of the BioCreative challenges. Epistemological assumptions, which in research asks "What do you know? or "How do you know it?", creates
2356-424: Is the nature of reality?". One of five traditional branches of philosophy, metaphysics is concerned with exploring existence through properties, entities and relations such as those between particulars and universals , intrinsic and extrinsic properties , or essence and existence . Metaphysics has been an ongoing topic of discussion since recorded history. The compound word ontology combines onto - , from
BioNumerics - Misplaced Pages Continue
2432-454: Is the study of the origin and descent of species , as well as their change over time. Informatics has assisted evolutionary biologists by enabling researchers to: Future work endeavours to reconstruct the now more complex tree of life . The core of comparative genome analysis is the establishment of the correspondence between genes ( orthology analysis) or other genomic features in different organisms. Intergenomic maps are made to trace
2508-461: Is to assign function to the protein products of the genome. Databases of protein sequences and functional domains and motifs are used for this type of annotation. About half of the predicted proteins in a new genome sequence tend to have no obvious function. Understanding the function of genes and their products in the context of cellular and organismal physiology is the goal of process-level annotation. An obstacle of process-level annotation has been
2584-605: Is to increase the understanding of biological processes. What sets it apart from other approaches is its focus on developing and applying computationally intensive techniques to achieve this goal. Examples include: pattern recognition , data mining , machine learning algorithms, and visualization . Major research efforts in the field include sequence alignment , gene finding , genome assembly , drug design , drug discovery , protein structure alignment , protein structure prediction , prediction of gene expression and protein–protein interactions , genome-wide association studies ,
2660-430: Is transcribed into mRNA. Enhancer elements far away from the promoter can also regulate gene expression, through three-dimensional looping interactions. These interactions can be determined by bioinformatic analysis of chromosome conformation capture experiments. Expression data can be used to infer gene regulation: one might compare microarray data from a wide variety of states of an organism to form hypotheses about
2736-616: Is used to predict the structure of an unknown protein from existing homologous proteins. One example of this is hemoglobin in humans and the hemoglobin in legumes ( leghemoglobin ), which are distant relatives from the same protein superfamily . Both serve the same purpose of transporting oxygen in the organism. Although both of these proteins have completely different amino acid sequences, their protein structures are virtually identical, which reflects their near identical purposes and shared ancestor. Ontologies Every academic discipline or field, in creating its terminology, thereby lays
2812-506: The Greek ὄν , on ( gen. ὄντος, ontos ), i.e. "being; that which is", which is the present participle of the verb εἰμί , eimí , i.e. "to be, I am", and -λογία , -logia , i.e. "logical discourse", see classical compounds for this type of word formation. While the etymology is Greek, the oldest extant record of the word itself, the Neo-Latin form ontologia , appeared in 1606 in
2888-581: The Online Mendelian Inheritance in Man database, but complex diseases are more difficult. Association studies have found many individual genetic regions that individually are weakly associated with complex diseases (such as infertility , breast cancer and Alzheimer's disease ), rather than a single cause. There are currently many challenges to using genes for diagnosis and treatment, such as how we don't know which genes are important, or how stable
2964-498: The definition and ontology of economics is a primary concern in Marxist economics , but also in other subfields of economics . An example of economics relying on information science occurs in cases where a simulation or model is intended to enable economic decisions, such as determining what capital assets are at risk and by how much (see risk management ). What ontologies in both information science and philosophy have in common
3040-449: The nucleotide , protein, and process levels. Gene finding is a chief aspect of nucleotide-level annotation. For complex genomes, a combination of ab initio gene prediction and sequence comparison with expressed sequence databases and other organisms can be successful. Nucleotide-level annotation also allows the integration of genome sequence with other genetic and physical maps of the genome. The principal aim of protein-level annotation
3116-498: The subsumption relation , but ontologies need not be limited to these forms. Ontologies are also not limited to conservative definitions – that is, definitions in the traditional logic sense that only introduce terminology and do not add any knowledge about the world. To specify a conceptualization, one needs to state axioms that do constrain the possible interpretations for the defined terms. As refinement of Gruber's definition Feilmayr and Wöß (2016) stated: "An ontology
BioNumerics - Misplaced Pages Continue
3192-560: The 1970s, new techniques for sequencing DNA were applied to bacteriophage MS2 and øX174, and the extended nucleotide sequences were then parsed with informational and statistical algorithms. These studies illustrated that well known features, such as the coding segments and the triplet code, are revealed in straightforward statistical analyses and were the proof of the concept that bioinformatics would be insightful. In order to study how normal cellular activities are altered in different disease states, raw biological data must be combined to form
3268-581: The Dispensable/Flexible genome: a set of genes not present in all but one or some genomes under study. A bioinformatics tool BPGA can be used to characterize the Pan Genome of bacterial species. As of 2013, the existence of efficient high-throughput next-generation sequencing technology allows for the identification of cause many different human disorders. Simple Mendelian inheritance has been observed for over 3,000 disorders that have been identified at
3344-399: The activity of one or more proteins . Bioinformatics techniques have been applied to explore various steps in this process. For example, gene expression can be regulated by nearby elements in the genome. Promoter analysis involves the identification and study of sequence motifs in the DNA surrounding the protein-coding region of a gene. These motifs influence the extent to which that region
3420-576: The bacteriophage Phage Φ-X174 was sequenced in 1977, the DNA sequences of thousands of organisms have been decoded and stored in databases. This sequence information is analyzed to determine genes that encode proteins , RNA genes, regulatory sequences, structural motifs, and repetitive sequences. A comparison of genes within a species or between different species can show similarities between protein functions, or relations between species (the use of molecular systematics to construct phylogenetic trees ). With
3496-446: The biological measurement, and a major research area in computational biology involves developing statistical tools to separate signal from noise in high-throughput gene expression studies. Such studies are often used to determine the genes implicated in a disorder: one might compare microarray data from cancerous epithelial cells to data from non-cancerous cells to determine the transcripts that are up-regulated and down-regulated in
3572-439: The biological pathways and networks that are an important part of systems biology . In structural biology , it aids in the simulation and modeling of DNA, RNA, proteins as well as biomolecular interactions. The first definition of the term bioinformatics was coined by Paulien Hogeweg and Ben Hesper in 1970, to refer to the study of information processes in biotic systems. This definition placed bioinformatics as
3648-520: The choices an algorithm provides. Genome-wide association studies have successfully identified thousands of common genetic variants for complex diseases and traits; however, these common variants only explain a small fraction of heritability. Rare variants may account for some of the missing heritability . Large-scale whole genome sequencing studies have rapidly sequenced millions of whole genomes, and such studies have identified hundreds of millions of rare variants . Functional annotations predict
3724-451: The complicated statistical analysis of samples when multiple incomplete peptides from each protein are detected. Cellular protein localization in a tissue context can be achieved through affinity proteomics displayed as spatial data based on immunohistochemistry and tissue microarrays . Gene regulation is a complex process where a signal, such as an extracellular signal such as a hormone , eventually leads to an increase or decrease in
3800-411: The development of biological and gene ontologies to organize and query biological data. It also plays a role in the analysis of gene and protein expression and regulation. Bioinformatics tools aid in comparing, analyzing and interpreting genetic and genomic data and more generally in the understanding of evolutionary aspects of molecular biology. At a more integrative level, it helps analyze and catalogue
3876-582: The effect or function of a genetic variant and help to prioritize rare functional variants, and incorporating these annotations can effectively boost the power of genetic association of rare variants analysis of whole genome sequencing studies. Some tools have been developed to provide all-in-one rare variant association analysis for whole-genome sequencing data, including integration of genotype data and their functional annotations, association analysis, result summary and visualization. Meta-analysis of whole genome sequencing studies provides an attractive solution to
SECTION 50
#17329015484643952-674: The entry. In BioNumerics, experiments are divided in seven classes: fingerprints, spectra, characters, sequences, sequence read sets, trend data and matrices. Examples of BioNumerics applications are whole genome Multi Locus Sequence Typing (wgMLST), whole genome Single Nucleotide Polymorphisms (wgSNP), genome comparison , identification based on MALDI-TOF Mass Spectrometry , PFGE typing, Amplified Fragment Length Polymorphism (AFLP) typing, sequence-based typing of viruses, antibiotic resistance profiling and functional genotyping. Bioinformatics Bioinformatics ( / ˌ b aɪ . oʊ ˌ ɪ n f ər ˈ m æ t ɪ k s / )
4028-643: The evolutionary processes responsible for the divergence of two genomes. A multitude of evolutionary events acting at various organizational levels shape genome evolution. At the lowest level, point mutations affect individual nucleotides. At a higher level, large chromosomal segments undergo duplication, lateral transfer, inversion, transposition, deletion and insertion. Entire genomes are involved in processes of hybridization, polyploidization and endosymbiosis that lead to rapid speciation. The complexity of genome evolution poses many exciting challenges to developers of mathematical models and algorithms, who have recourse to
4104-447: The field of artificial intelligence (AI) have recognized that knowledge engineering is the key to building large and powerful AI systems . AI researchers argued that they could create new ontologies as computational models that enable certain kinds of automated reasoning , which was only marginally successful . In the 1980s, the AI community began to use the term ontology to refer to both
4180-421: The first bacterial genome, Haemophilus influenzae ) generates the sequences of many thousands of small DNA fragments (ranging from 35 to 900 nucleotides long, depending on the sequencing technology). The ends of these fragments overlap and, when aligned properly by a genome assembly program, can be used to reconstruct the complete genome. Shotgun sequencing yields sequence data quickly, but the task of assembling
4256-426: The foundation researchers use when approaching a certain topic or area for potential research. As epistemology is directly linked to knowledge and how we come about accepting certain truths, individuals conducting academic research must understand what allows them to begin theory building. Simply, epistemological assumptions force researchers to question how they arrive at the knowledge they have. An ontology language
4332-473: The fragments can be quite complicated for larger genomes. For a genome as large as the human genome , it may take many days of CPU time on large-memory, multiprocessor computers to assemble the fragments, and the resulting assembly usually contains numerous gaps that must be filled in later. Shotgun sequencing is the method of choice for virtually all genomes sequenced (rather than chain-termination or chemical degradation methods), and genome assembly algorithms are
4408-456: The function of genes. In fact, most gene function prediction methods focus on protein sequences as they are more informative and more feature-rich. For instance, the distribution of hydrophobic amino acids predicts transmembrane segments in proteins. However, protein function prediction can also use external information such as gene (or protein) expression data, protein structure , or protein-protein interactions . Evolutionary biology
4484-616: The genes encoding all proteins, transfer RNAs, ribosomal RNAs, in order to make initial functional assignments. The GeneMark program trained to find protein-coding genes in Haemophilus influenzae is constantly changing and improving. Following the goals that the Human Genome Project left to achieve after its closure in 2003, the ENCODE project was developed by the National Human Genome Research Institute . This project
4560-642: The genes involved in each state. In a single-cell organism, one might compare stages of the cell cycle , along with various stress conditions (heat shock, starvation, etc.). Clustering algorithms can be then applied to expression data to determine which genes are co-expressed. For example, the upstream regions (promoters) of co-expressed genes can be searched for over-represented regulatory elements . Examples of clustering algorithms applied in gene clustering are k-means clustering , self-organizing maps (SOMs), hierarchical clustering , and consensus clustering methods. Several approaches have been developed to analyze
4636-554: The genetic basis of disease, unique adaptations, desirable properties (esp. in agricultural species), or differences between populations. Bioinformatics also includes proteomics , which tries to understand the organizational principles within nucleic acid and protein sequences. Image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics, it aids in sequencing and annotating genomes and their observed mutations . Bioinformatics includes text mining of biological literature and
SECTION 60
#17329015484644712-775: The genome. Furthermore, tracking of patients while the disease progresses may be possible in the future with the sequence of cancer samples. Another type of data that requires novel informatics development is the analysis of lesions found to be recurrent among many tumors. The expression of many genes can be determined by measuring mRNA levels with multiple techniques including microarrays , expressed cDNA sequence tag (EST) sequencing, serial analysis of gene expression (SAGE) tag sequencing, massively parallel signature sequencing (MPSS), RNA-Seq , also known as "Whole Transcriptome Shotgun Sequencing" (WTSS), or various applications of multiplexed in-situ hybridization. All of these techniques are extremely noise-prone and/or subject to bias in
4788-458: The groundwork for an ontology. Each uses ontological assumptions to frame explicit theories, research and applications. Improved ontologies may improve problem solving within that domain, interoperability of data systems, and discoverability of data. Translating research papers within every field is a problem made easier when experts from different countries maintain a controlled vocabulary of jargon between each of their languages. For instance,
4864-406: The growing amount of data, it long ago became impractical to analyze DNA sequences manually. Computer programs such as BLAST are used routinely to search sequences—as of 2008, from more than 260,000 organisms, containing over 190 billion nucleotides . Before sequences can be analyzed, they are obtained from a data storage bank, such as GenBank. DNA sequencing is still a non-trivial problem as
4940-431: The inconsistency of terms used by different model systems. The Gene Ontology Consortium is helping to solve this problem. The first description of a comprehensive annotation system was published in 1995 by The Institute for Genomic Research , which performed the first complete sequencing and analysis of the genome of a free-living (non- symbiotic ) organism, the bacterium Haemophilus influenzae . The system identifies
5016-419: The knowledge contained in software applications, and organizational procedures for a particular domain. Ontology engineering offers a direction for overcoming semantic obstacles, such as those related to the definitions of business terms and software classes. Known challenges with ontology engineering include: Ontology editors are applications designed to assist in the creation or manipulation of ontologies. It
5092-427: The location of organelles, genes, proteins, and other components within cells. A gene ontology category, cellular component , has been devised to capture subcellular localization in many biological databases . Microscopic pictures allow for the location of organelles as well as molecules, which may be the source of abnormalities in diseases. Finding the location of proteins allows us to predict what they do. This
5168-462: The modeling of evolution and cell division/mitosis. Bioinformatics entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Over the past few decades, rapid developments in genomic and other molecular research technologies and developments in information technologies have combined to produce
5244-516: The problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. In cancer , the genomes of affected cells are rearranged in complex or unpredictable ways. In addition to single-nucleotide polymorphism arrays identifying point mutations that cause cancer, oligonucleotide microarrays can be used to identify chromosomal gains and losses (called comparative genomic hybridization ). These detection methods generate terabytes of data per experiment. The data
5320-405: The raw data may be noisy or affected by weak signals. Algorithms have been developed for base calling for the various experimental approaches to DNA sequencing. Most DNA sequencing techniques produce short fragments of sequence that need to be assembled to obtain complete gene or genome sequences. The shotgun sequencing technique (used by The Institute for Genomic Research (TIGR) to sequence
5396-528: The same project. As systems that rely on domain ontologies expand, they often need to merge domain ontologies by hand-tuning each entity or using a combination of software merging and hand-tuning. This presents a challenge to the ontology designer. Different ontologies in the same domain arise due to different languages, different intended usage of the ontologies, and different perceptions of the domain (based on cultural background, education, ideology, etc.) . At present, merging ontologies that are not developed from
5472-652: The term as a specification of a conceptualization : An ontology is a description (like a formal specification of a program) of the concepts and relationships that can formally exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set of concept definitions, but more general. And it is a different sense of the word than its use in philosophy. Attempting to distance ontologies from taxonomies and similar efforts in knowledge modeling that rely on classes and inheritance , Gruber stated (1993): Ontologies are often equated with taxonomic hierarchies of classes, class definitions, and
5548-477: The three-dimensional structure and nuclear organization of chromatin . Bioinformatic challenges in this field include partitioning the genome into domains, such as Topologically Associating Domains (TADs), that are organised together in three-dimensional space. Finding the structure of proteins is an important application of bioinformatics. The Critical Assessment of Protein Structure Prediction (CASP)
5624-454: The time. In the genomic branch of bioinformatics, homology is used to predict the function of a gene: if the sequence of gene A , whose function is known, is homologous to the sequence of gene B, whose function is unknown, one could infer that B may share A's function. In structural bioinformatics, homology is used to determine which parts of a protein are important in structure formation and interaction with other proteins. Homology modeling
5700-498: The two terms is often disputed. To some, the term computational biology refers to building and using models of biological systems. Computational, statistical, and computer programming techniques have been used for computer simulation analyses of biological queries. They include reused specific analysis "pipelines", particularly in the field of genomics , such as by the identification of genes and single nucleotide polymorphisms ( SNPs ). These pipelines are used to better understand
5776-618: The work Ogdoas Scholastica by Jacob Lorhard ( Lorhardus ) and in 1613 in the Lexicon philosophicum by Rudolf Göckel ( Goclenius ). The first occurrence in English of ontology as recorded by the OED ( Oxford English Dictionary , online edition, 2008) came in Archeologia Philosophica Nova or New Principles of Philosophy by Gideon Harvey . Since the mid-1970s, researchers in
#463536