The Cancer Genome Anatomy Project ( CGAP ), created by the National Cancer Institute (NCI) in 1997 and introduced by Al Gore , is an online database on normal, pre-cancerous and cancerous genomes. It also provides tools for viewing and analysis of the data, allowing for identification of genes involved in various aspects of tumor progression. The goal of CGAP is to characterize cancer at a molecular level by providing a platform with readily accessible updated data and a set of tools such that researchers can easily relate their findings to existing knowledge. There is also a focus on development of software tools that improve the usage of large and complex datasets. The project is directed by Daniela S. Gerhard, and includes sub-projects or initiatives, with notable ones including the Cancer Chromosome Aberration Project (CCAP) and the Genetic Annotation Initiative (GAI). CGAP contributes to many databases and organisations such as the NCBI contribute to CGAP's databases.
45-553: CGAP may refer to: Cancer Genome Anatomy Project , with the goal of documenting sequences of RNA transcripts Consultative Group to Assist the Poor , a partnership of organizations at the World Bank Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with the title CGAP . If an internal link led you here, you may wish to change
90-472: A SNP allele that is common in one geographical or ethnic group may be much rarer in another. However, this pattern of variation is relatively rare; in a global sample of 67.3 million SNPs, the Human Genome Diversity Project "found no such private variants that are fixed in a given continent or major region. The highest frequencies are reached by a few tens of variants present at >70% (and
135-539: A common consensus. The rs### standard is that which has been adopted by dbSNP and uses the prefix "rs", for "reference SNP", followed by a unique and arbitrary number. SNPs are frequently referred to by their dbSNP rs number, as in the examples above. The Human Genome Variation Society (HGVS) uses a standard which conveys more information about the SNP. Examples are: SNPs can be easily assayed due to only containing two possible alleles and three possible genotypes involving
180-476: A comparison is made between normal and cancer tissues. Statistical significance is determined by DGED using a combination of Bayesian statistics and a sequence odds ratio to calculate a probability. cDNA DGED relies on the UniGene relational database while the cDNA xProfiler uses a flat file database that is not available online. CGAP is now a centralised location for several genomics tools and genetic databases and
225-406: A correlation between a particular cancer's progression with its therapeutic outcome, improved evaluation of treatment and development of novel techniques for prevention, detection and treatment. This is achieved by characterisation of biological tissue mRNA products. The fundamental cause of cancer is the inability for a cell to regulate its gene expression. To characterise a specific type of cancer,
270-762: A few thousands at >50%) in Africa, the Americas, and Oceania. By contrast, the highest frequency variants private to Europe, East Asia, the Middle East, or Central and South Asia reach just 10 to 30%." Within a population, SNPs can be assigned a minor allele frequency —the lowest allele frequency at a locus that is observed in a particular population. This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms. With this knowledge scientists have developed new methods in analyzing population structures in less studied species. By using pooling techniques
315-798: A gene or a list of genes based on specified search criteria and provides links to different NCI and NCBI databases. A gene can be searched for specifically using a unique identifier such as gene symbols and Entrez gene number as well as generally by function, tissue or keyword. Other gene tools accessible through the CGAP web interface include the Gene Ontology Browser (GO) and the Nucleotide BLAST tool. cDNA xProfiler and cDNA Digital gene expression displayer (DGED) together are used to find statistically significant genes of interest that are differentially expressed within two pools of cDNA libraries, typically
360-409: A good probability of a match. This can additionally be applied to increase the accuracy of facial reconstructions by providing information that may otherwise be unknown, and this information can be used to help identify suspects even without a STR DNA profile match. Some cons to using SNPs versus STRs is that SNPs yield less information than STRs, and therefore more SNPs are needed for analysis before
405-523: A powerful tool to map genomic regions or genes that are involved in disease pathogenesis. Recently, preliminary results reported SNPs as important components of the epigenetic program in organisms. Moreover, cosmopolitan studies in European and South Asiatic populations have revealed the influence of SNPs in the methylation of specific CpG sites. In addition, meQTL enrichment analysis using GWAS database, demonstrated that those associations are important toward
450-751: A profile of a suspect is able to be created. Additionally, SNPs heavily rely on the presence of a database for comparative analysis of samples. However, in instances with degraded or small volume samples, SNP techniques are an excellent alternative to STR methods. SNPs (as opposed to STRs) have an abundance of potential markers, can be fully automated, and a possible reduction of required fragment length to less than 100bp.[26] Pharmacogenetics focuses on identifying genetic variations including SNPs associated with differential responses to treatment. Many drug metabolizing enzymes, drug targets, or target pathways can be influenced by SNPs. The SNPs involved in drug metabolizing enzyme activities can change drug pharmacokinetics, while
495-569: A sequence alignment and assembly overview with context to sequences from which they were predicted. SNPs are annotated and integrated genetic/physical maps are often determined. Genomic instability is a common feature of cancer; therefore understanding structural and chromosomal abnormalities can give insight into the progression of disease. The Cancer Chromosome Aberration Project (cCAP) is a CGAP supported initiative used for defining chromosome structure and to characterize rearrangements that are associated with malignant transformation. It incorporates
SECTION 10
#1733084978715540-585: A wide range of diseases across a population. For example, a common SNP in the CFH gene is associated with increased risk of age-related macular degeneration. Differences in the severity of an illness or response to treatments may also be manifestations of genetic variations caused by SNPs. For example, two common SNPs in the APOE gene, rs429358 and rs7412, lead to three major APO-E alleles with different associated risks for development of Alzheimer's disease and age at onset of
585-426: Is a hypothesis driven approach. Since only a limited number of SNPs are tested, a relatively small sample size is sufficient to detect the association. Candidate gene association approach is also commonly used to confirm findings from GWAS in independent samples. Genome-wide SNP data can be used for homozygosity mapping. Homozygosity mapping is a method used to identify homozygous autosomal recessive loci, which can be
630-622: Is a possibility in combining the advantages of SNPs with micro satellite markers. However, there are information lost in the process such as linkage disequilibrium and zygosity information. Variations in the DNA sequences of humans can affect how humans develop diseases and respond to pathogens , chemicals , drugs , vaccines , and other agents. SNPs are also critical for personalized medicine . Examples include biomedical research, forensics, pharmacogenetics, and disease causation, as outlined below. One of main contributions of SNPs in clinical research
675-514: Is also used. This method identifies, for each cDNA transcript molecule produced from a cell's gene expression, regions only 10-14 bases long anywhere along the read sequence, sufficient to uniquely identify that cDNA transcript. These bases are cut out and linked together, then incorporated into bacterial plasmids as mentioned above. SAGE libraries have better read quality and generate a larger amount of data when sequenced, and since transcripts are compared in absolute rather than relative levels, SAGE has
720-487: Is differentiation between sequencing errors with actual polymorphisms. SNPs that are found undergo statistical analysis using the CGAP SNP pipeline to calculate the probability that the variant is in fact a polymorphism. High probability SNPs are validated and there are tools available that make predictions as to whether function is altered. To make the data easily accessible CGAP-GAI has a number of tools which can display both
765-481: Is digital differential display (DDD), which uses the Fisher exact test to compare libraries against each other, in order to find a significant difference between populations. CGAP ensured that DDD was able to compare between all cDNA libraries in dbEST, and not just those which were generated by CGAP. The MGC provides researchers with full-length protein information from cDNA, unlike EST or SAGE databases which only provide
810-573: Is employed widely in cancer and molecular biology research. The databases established by CGAP continues to contribute to knowledge of cancers in terms of their pathways and progression. The transcriptome databases can also be used in non-cancer related research, as they contain information that can be used to quickly and easily identify particular sequenced genes. The data also has clinical impact, as cDNAs can be used to create microarrays for diagnosis and treatment comparison purposes. CGAP has been used in many studies, with examples including: In addition,
855-523: Is genome-wide association study (GWAS). Genome-wide genetic data can be generated by multiple technologies, including SNP array and whole genome sequencing. GWAS has been commonly used in identifying SNPs associated with diseases or clinical phenotypes or traits. Since GWAS is a genome-wide assessment, a large sample site is required to obtain sufficient statistical power to detect all possible associations. Some SNPs have relatively small effect on diseases or clinical phenotypes or traits. To estimate study power,
900-679: Is not homogenous; SNPs occur in non-coding regions more frequently than in coding regions or, in general, where natural selection is acting and "fixing" the allele (eliminating other variants) of the SNP that constitutes the most favorable genetic adaptation. Other factors, like genetic recombination and mutation rate, can also determine SNP density. SNP density can be predicted by the presence of microsatellites : AT microsatellites in particular are potent predictors of SNP density, with long (AT)(n) repeat tracts tending to be found in regions of significantly reduced SNP density and low GC content . There are variations between human populations, so
945-559: The intergenic regions (regions between genes). SNPs within a coding sequence do not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code . SNPs in the coding region are of two types: synonymous SNPs and nonsynonymous SNPs. Synonymous SNPs do not affect the protein sequence, while nonsynonymous SNPs change the amino acid sequence of protein. SNPs that are not in protein-coding regions may still affect gene splicing , transcription factor binding, messenger RNA degradation, or
SECTION 20
#1733084978715990-514: The CGAP-GAI are either found as a result of resequencing genes of interest in different individuals or looking through existing human EST databases and making comparisons. It examines transcripts from healthy individuals, individuals with disease, tumour tissue and cell lines from a large set of individuals; therefore the database is more likely to include rare disease mutations in addition to high frequency variants. A common challenge with SNP detection
1035-660: The Cancer Genome Anatomy Project Genome Annotation Initiative (CGAP-GAI) is to discover and catalogue single nucleotide polymorphisms (SNPs) that correlate with cancer initiation and progression. CGAP-GAI have created a variety of tools for the discovery, analysis and display of SNPs. SNPs are valuable in cancer research as they can be used in several different genetic studies, commonly to track transmission, identify alternate forms of genes and analyze complex molecular pathways that regulate cell metabolism, growth, or differentiation. SNPs in
1080-498: The SNPs involved in drug target or its pathway can change drug pharmacodynamics. Therefore, SNPs are potential genetic markers that can be used to predict drug exposure or effectiveness of the treatment. Genome-wide pharmacogenetic study is called pharmacogenomics . Pharmacogenetics and pharmacogenomics are important in the development of precision medicine, especially for life-threatening diseases such as cancers. Only small amount of SNPs in
1125-468: The SNPs with relatively small effect on diseases. For common and complex diseases, such as type-2 diabetes, rheumatoid arthritis, and Alzheimer's disease, multiple genetic factors are involved in disease etiology. In addition, gene-gene interaction and gene-environment interaction also play an important role in disease initiation and progression. As there are for genes, bioinformatics databases exist for SNPs. The International SNP Map working group mapped
1170-446: The advantage of requiring no normalisation of data via comparison with a reference. Following sequencing and establishment of libraries, CGAP incorporates the data along with existing data sources and provides various databases and tools for analysis. A detailed description of tools and databases created or used by CGAP can be found on NCI's CGAP website. Below are some of the initiatives or research tools provided by CGAP. The goal of
1215-410: The cost of the analysis is significantly lowered. These techniques are based on sequencing a population in a pooled sample instead of sequencing every individual within the population by itself. With new bioinformatics tools there is a possibility of investigating population structure, gene flow and gene migration by observing the allele frequencies within the entire population. With these protocols there
1260-581: The disease. Single nucleotide substitutions with an allele frequency of less than 1% are sometimes called single-nucleotide variants (SNVs) . "Variant" may also be used as a general term for any single nucleotide change in a DNA sequence, encompassing both common SNPs and rare mutations , whether germline or somatic . The term SNV has therefore been used to refer to point mutations found in cancer cells. DNA variants must also commonly be taken into consideration in molecular diagnostics applications such as designing PCR primers to detect viruses, in which
1305-550: The entire mRNA transcript that generated it. Practically, only part of the sequence is required to uniquely identify the mRNA or protein associated. The resultant part of the sequence was termed the expressed sequence tag (EST) and is always at the end of the sequence close to the poly A tail. EST data are stored in a database called dbEST. ESTs only need to be around 400 bases long, but with NGS sequencing techniques this will still produce low quality reads. Therefore, an improved method called serial analysis of gene expression (SAGE)
1350-483: The genetic model for disease needs to be considered, such as dominant, recessive, or additive effects. Due to genetic heterogeneity, GWAS analysis must be adjusted for race. Candidate gene association study is commonly used in genetic study before the invention of high throughput genotyping or sequencing technologies. Candidate gene association study is to investigate limited number of pre-specified SNPs for association with diseases or clinical phenotypes or traits. So this
1395-529: The human genome may have impact on human diseases. Large scale GWAS has been done for the most important human diseases, including heart diseases, metabolic diseases, autoimmune diseases, and neurodegenerative and psychiatric disorders. Most of the SNPs with relatively large effects on these diseases have been identified. These findings have significantly improved understanding of disease pathogenesis and molecular pathways, and facilitated development of better treatment. Further GWAS with larger samples size will reveal
CGAP - Misplaced Pages Continue
1440-429: The human genome that are physically available through a network of distributors. The CCAP Clone maps have been mapped cytogenetically using FISH at a resolution of 1-2Mb across the human genome, and physically mapped using sequence-tagged sites (STS). The data for BAC clones are also available through CGAP and NCBI databases. Listed below are some other resources available through CGAP. An early technique used by CGAP
1485-477: The identifying tag. The project includes human and mouse genes, and later cow cDNAs generated by Genome Canada were added. SAGEmap is the database used to store SAGE libraries. Over 3.4 million SAGE tags exist as of 2001. Tools can be used to map SAGE tags to UniGene clusters, a database that stores transcriptomes. This allows for easier identification of a SAGE tag's corresponding sequence. In addition, there are tools associated with SAGEmaps: The CGAP locates
1530-426: The link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=CGAP&oldid=1246683234 " Category : Disambiguation pages Hidden categories: Short description is different from Wikidata All article disambiguation pages All disambiguation pages Cancer Genome Anatomy Project The eventual outcomes of CGAP include establishing
1575-461: The online version of Mitelman's database, created by Felix Mitelman, Bertil Johansson and Fredrik Mertens prior to the creation of CGAP, another compilation of known chromosomal rearrangements. The CCAP has several goals: There is cytogenetic information from over 64,000 patient cases, including more than 2000 gene fusions, contained in the database. As part of this project there is a repository of physically and cytogenetically mapped BAC clones for
1620-411: The prediction of biological traits. SNPs have historically been used to match a forensic DNA sample to a suspect but has been made obsolete due to advancing STR -based DNA fingerprinting techniques. However, the development of next-generation-sequencing (NGS) technology may allow for more opportunities for the use of SNPs in phenotypic clues such as ethnicity, hair color, and eye color with
1665-415: The proteins that are produced from the altered gene expression or the mRNA precursor to the protein can be examined. CGAP works to associate a particular cell's expression profile , molecular signature or transcriptome , which is essentially the cell's fingerprint, with the cell's phenotype. Therefore, expression profiles exist with consideration to cancer type and stage of progression. CGAP's initial goal
1710-567: The sequence flanking each SNP by alignment to the genomic sequence of large-insert clones in Genebank. These alignments were converted to chromosomal coordinates that is shown in Table 1. This list has greatly increased since, with, for instance, the Kaviar database now listing 162 million single nucleotide variants (SNVs). The nomenclature for SNPs include several variations for an individual SNP, while lacking
1755-455: The sequence of noncoding RNA. Gene expression affected by this type of SNP is referred to as an eSNP (expression SNP) and may be upstream or downstream from the gene. More than 600 million SNPs have been identified across the human genome in the world's population. A typical genome differs from the reference human genome at 4 to 5 million sites, most of which (more than 99.9%) consist of SNPs and short indels . The genomic distribution of SNPs
1800-448: The substitution to be present in a sufficiently large fraction of the population (e.g. 1% or more), many publications do not apply such a frequency threshold. For example, a G nucleotide present at a specific location in a reference genome may be replaced by an A in a minority of individuals. The two possible nucleotide variations of this SNP – G or A – are called alleles . SNPs can help explain differences in susceptibility to
1845-1089: The two alleles: homozygous A, homozygous B and heterozygous AB, leading to many possible techniques for analysis. Some include: DNA sequencing ; capillary electrophoresis ; mass spectrometry ; single-strand conformation polymorphism (SSCP); single base extension ; electrochemical analysis; denaturating HPLC and gel electrophoresis ; restriction fragment length polymorphism ; and hybridization analysis. An important group of SNPs are those that corresponds to missense mutations causing amino acid change on protein level. Point mutation of particular residue can have different effect on protein function (from no effect to complete disruption its function). Usually, change in amino acids with similar size and physico-chemical properties (e.g. substitution from leucine to valine) has mild effect, and opposite. Similarly, if SNP disrupts secondary structure elements (e.g. substitution to proline in alpha helix region) such mutation usually may affect whole protein structure and function. Using those simple and many other machine learning derived rules
CGAP - Misplaced Pages Continue
1890-449: The vast amount of data generated by CGAP has prompted for improvement of data analysis and mining techniques, with examples including: Single nucleotide polymorphisms In genetics and bioinformatics , a single-nucleotide polymorphism ( SNP / s n ɪ p / ; plural SNPs / s n ɪ p s / ) is a germline substitution of a single nucleotide at a specific position in the genome . Although certain definitions require
1935-491: The viral RNA or DNA sample may contain SNVs. However, this nomenclature uses arbitrary distinctions (such as an allele frequency of 1%) and is not used consistently across all fields; the resulting disagreement has prompted calls for a more consistent framework for naming differences in DNA sequences between two samples. Single-nucleotide polymorphisms may fall within coding sequences of genes , non-coding regions of genes , or in
1980-564: The whole tissue sample and produced bulk tissue cDNA libraries. This cellular heterogeneity made gene expression information in terms of cancer biology less accurate. An example is prostate cancer tissue where epithelial cells, which have been shown to be the only cell type give rise to cancer, only consist 10% of the cell count. This led to development of laser capture microdissection (LCM), a technique that can isolate individual cell types individual cells, which gave rise to cDNA libraries of specific cell types. The sequencing of cDNA will produce
2025-564: Was to establish a Tumor Gene Index (TGI) to store the expression profiles. This would have contributions to both new and existing databases. This contributed to two types of libraries, the dbEST and later dbSAGE. This was performed in a series of steps: The TGI focused on prostate, breast, ovarian, lung and colon cancers at first, and CGAP extended to other cancers in its research. Practically, issues arose which CGAP accounted for as new technologies became available. Many cancers occur in tissues with multiple cell types. Traditional techniques took
#714285