Yeastract - Misplaced Pages

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics , proteomics , metabolomics , microarray gene expression, and phylogenetics . Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.

#661338

28-534: YEASTRACT ( Yea st S earch for T ranscriptional R egulators A nd C onsensus T racking) is a curated repository of more than 48000 regulatory associations between transcription factors (TF) and target genes in Saccharomyces cerevisiae , based on more than 1200 bibliographic references. It also includes the description of about 300 specific DNA binding sites for more than a hundred characterized TFs. Further information about each Yeast gene has been extracted from

56-636: A host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species . This knowledge helps facilitate the fight against diseases, assists in the development of medications , predicting certain genetic diseases and in discovering basic relationships among species in the history of life . Relational database concepts of computer science and Information retrieval concepts of digital libraries are important for understanding biological databases. Biological database design, development, and long-term management

84-485: A large number of different formats . This situation makes searching these data and performing the analysis necessary for the extraction of new knowledge from the complete set of available data very difficult. Integrative bioinformatics attempts to tackle this problem by providing unified access to life science data. In the Semantic Web approach, data from multiple websites or databases is searched via metadata . Metadata

112-685: A novel framework ontology of generic organs. For example, results from a search of ‘heart’ in this ontology would return the heart plans for each of the vertebrate species whose ontologies were included. The stated goal of the project is to facilitate comparative and evolutionary studies. In the data warehousing strategy, the data from different sources are extracted and integrated in a single database. For example, various 'omics' datasets may be integrated to provide biological insights into biological systems. Examples include data from genomics, transcriptomics, proteomics, interactomics, metabolomics. Ideally, changes in these sources are regularly synchronized to

140-434: Is machine-readable code, which defines the contents of the page for the program so that the comparisons between the data and the search terms are more accurate. This serves to decrease the number of results that are irrelevant or unhelpful. Some meta-data exists as definitions called ontologies , which can be tagged by either users or programs; these serve to facilitate searches by using key terms or phrases to find and return

168-434: Is a core area of the discipline of bioinformatics . Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi- structured data , and can be represented as tables, key delimited records, and XML structures. Most biological databases are available through web sites that organise data such that users can browse through

196-406: Is a discipline of bioinformatics that focuses on problems of data integration for the life sciences . With the rise of high-throughput (HTP) technologies in the life sciences, particularly in molecular biology , the amount of collected data has grown in an exponential fashion. Furthermore, the data are scattered over a plethora of both public and private repositories , and are stored using

224-623: Is a special yearly issue of the journal Nucleic Acids Research (NAR). The Database Issue of NAR is freely available, and categorizes many of the public biological databases. A companion database to the issue called the Online Molecular Biology Database Collection lists 1,380 online databases. Other collections of databases exist such as MetaBase and the Bioinformatics Links Collection. Integrative bioinformatics Integrative bioinformatics

252-575: Is an E. coli database. Other popular model organism databases include Mouse Genome Informatics for the laboratory mouse , Mus musculus , the Rat Genome Database for Rattus , ZFIN for Danio Rerio (zebrafish), PomBase for the fission yeast Schizosaccharomyces pombe , FlyBase for Drosophila , WormBase for the nematodes Caenorhabditis elegans and Caenorhabditis briggsae , and Xenbase for Xenopus tropicalis and Xenopus laevis frogs. Numerous databases attempt to document

280-588: Is how biological databases cross-reference to other databases with accession numbers to link their related knowledge together (e.g. so that the accession number stays the same even if a species name changes). Redundancy is another problem, as many databases must store the same information, e.g. protein structure databases also contain the sequence of the proteins they cover, their sequence, and their bibliographic information. Species-specific databases are available for some species, mainly those that are often used in research ( model organisms ). For example, EcoCyc

308-579: Is that it is costly to compile such a warehouse. Standardized formats for different types of data (ex: protein data) are now emerging due to the influence of groups like the Proteomics Standards Initiative (PSI). Some data warehousing projects even require the submission of data in one of these new formats. Data mining uses statistical methods to search for patterns in existing data. This method generally returns many patterns, of which some are spurious and some are significant, but all of

SECTION 10

#1732855821662

336-484: Is that the terms used in tagging and searching can sometimes be ambiguous and may cause confusion among the results. In addition, the semantic web approach is still considered an emerging technology and is not in wide-scale use at this time. One of the current applications of ontology-based search in the biomedical sciences is GoPubMed , which searches the PubMed database of scientific literature. Another use of ontologies

364-413: Is within databases such as SwissProt , Ensembl and TrEMBL , which use this technology to search through the stores of human proteome-related data for tags related to the search term. Some of the research in this field has focused on creating new and specific ontologies. Other researchers have worked on verifying the results of existing ontologies. In a specific example, the goal of Verschelde, et al.

392-591: The Saccharomyces Genome Database (SGD). For each gene the associated Gene Ontology (GO) terms and their hierarchy in GO was obtained from the GO consortium. Currently, YEASTRACT maintains more than 7100 terms from GO. The nucleotide sequences of the promoter and coding regions for Yeast genes were obtained from Regulatory Sequence Analysis Tools (RSAT). All the information in YEASTRACT is updated regularly to match

420-1208: The Catalogue of Life draws from 165 databases as of May 2022. Operational costs of the Catalogue of Life are paid for by the Global Biodiversity Information Facility , the Illinois Natural History Survey , the Naturalis Biodiversity Center , and the Smithsonian Institution . Some biological databases also document geographical distribution of different species. Shuang Dai et al. created a new multi-source database to document spatial/geographical distribution of 1,371 bird species in China, as existing databases had been severely lacking in spatial distribution data for many species. Sources for this new database included books, literature, GPS tracking, and online webpage data. The new database displayed taxonomy, distribution, species info, and data sources for each species. After completion of

448-470: The YEASTRACT database. Facilities are also provided to enable the exploitation of the gathered data when solving a number of biological questions, as exemplified in the Tutorial. YEASTRACT allows the identification of documented or potential transcription regulators of a given gene and of documented or potential regulons for each transcription factor. It also renders possible the comparison between DNA motifs and

476-404: The bird spatial distribution database, it was discovered that 61% of known species in China were found to be distributed in regions beyond where they were previously known. Medical databases are a special case of biomedical data resource and can range from bibliographies, such as PubMed , to image databases for the development of AI based diagnostic software. For instance, one such image database

504-466: The data online. In addition the underlying data is usually available for download in a variety of formats. Biological data comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example: Biological knowledge is distributed among countless databases. This sometimes makes it difficult to ensure the consistency of information, e.g. when different names are used for

532-421: The data. Advantages of this approach include the general increased quality of the data returned in searches and with proper tagging, ontologies finding entries that may not explicitly state the search term but are still relevant. One disadvantage of this approach is that the results that are returned come in the format of the database of their origin and as such, direct comparisons may be difficult. Another problem

560-735: The diversity of life on earth. A prominent example is the Catalogue of Life , first created in 2001 by Species 2000 and the Integrated Taxonomic Information System. The Catalogue of Life is a collaborative project that aims to document taxonomic categorization of all currently accepted species in the world. The Catalogue of Life provides a consolidated and consistent database for researchers and policymakers to reference. The Catalogue of Life curates up-to-date datasets from other sources such as Conifer Database, ICTV MSL (for viruses), and LepIndex (for butterflies and moths). In total,

588-625: The gathered data and to predict transcription regulation networks in yeast from data emerging from gene-by-gene analysis or global approaches. Biological database Biological databases can be classified by the kind of data they collect (see below). Broadly, there are molecular databases (for sequences, molecules, etc.), functional databases (for physiology, enzyme activities, phenotypes, ecology etc), taxonomic databases (for species and other taxonomic ranks), images and other media, or specimens (for museum collections etc.) Databases are important tools in assisting scientists to analyze and explain

SECTION 20

#1732855821662

616-482: The integrated database. The data is presented to the users in a common format. Many programs aimed to aid in the creation of such warehouses are designed to be extremely versatile to allow for them to be implemented in diverse research projects. One advantage of this approach is that data is available for analysis at a single site, using a uniform schema. Some disadvantages are that the datasets are often huge and difficult to keep up to date. Another problem with this method

644-526: The latest data from SGD, GO consortium, RSA Tools and recent literature on yeast regulatory networks. YEASTRACT includes DISCOVERER, a set of tools that can be used to identify complex motifs found to be over-represented in the promoter regions of co-regulated genes. DISCOVERER is based on the MUSA algorithm. These algorithms take as input a list of genes and identify over-represented motifs, which can then be compared with transcription factor binding sites described in

672-528: The patterns the program finds must be evaluated individually. Currently, some research is focused on incorporating existing data mining techniques with novel pattern analysis methods that reduce the need to spend time going over each pattern found by the initial program, but instead, return a few results with a high likelihood of relevance. One drawback of this approach is that it does not integrate multiple databases, which means that comparisons across databases are not possible. The major advantage to this approach

700-490: The same species or different data formats. As a consequence, inter-operability is a constant challenge for information exchange. For instance, if a DNA sequence database stores the DNA sequence along the name of a species, a name change of that species may break the links to other databases which may use a different name. Integrative bioinformatics is one field attempting to tackle this problem by providing unified access. One solution

728-410: The transcription factor binding sites described in the literature. The system also provides a useful mechanism for grouping a list of genes (for instance a set of genes with similar expression profiles as revealed by microarray analysis) based on their regulatory associations with known transcription factors. YEASTRACT provides a set of queries to search and retrieve important biological information from

756-553: Was developed with the goal of aiding in the development of wound monitoring algorithms. Over 188 multi-modal image sets were curated from 79 patient visits, consisting of photographs, thermal images, and 3D mesh depth maps. Wound outlines were manually drawn and added to the photo datasets. The database was made publicly available in the form of a program called WoundsDB, downloadable from the Chronic Wound Database website. An important resource for finding biological databases

784-481: Was the integration of several different ontology libraries into a larger one that contained more definitions of different subspecialties (medical, molecular biological, etc.) and was able to distinguish between ambiguous tags; the result was a data-warehouse like effect, with easy access to multiple databases through the use of ontologies. In a separate project, Bertens, et al. constructed a lattice work of three ontologies (for anatomy and development of model organisms) on

#661338