3PBH , 1CSB , 1GMY , 1HUC , 1PBH , 2IPP , 2PBH , 3AI8 , 3CBJ , 3CBK , 3K9M
30-448: 1508 13030 ENSG00000164733 ENSG00000285132 ENSMUSG00000021939 P07858 P10605 NM_001317237 NM_007798 NP_680093 NP_031824 Cathepsin B belongs to a family of lysosomal cysteine proteases known as the cysteine cathepsins and plays an important role in intracellular proteolysis. In humans, cathepsin B is encoded by the CTSB gene . Cathepsin B
60-588: A population , or a species ( extinct or extant ). Clades are nested, one in another, as each branch in turn splits into smaller branches. These splits reflect evolutionary history as populations diverged and evolved independently. Clades are termed monophyletic (Greek: "one clan") groups. Over the last few decades, the cladistic approach has revolutionized biological classification and revealed surprising evolutionary relationships among organisms. Increasingly, taxonomists try to avoid naming taxa that are not clades; that is, taxa that are not monophyletic . Some of
90-479: A "ladder", with supposedly more "advanced" organisms at the top. Taxonomists have increasingly worked to make the taxonomic system reflect evolution. When it comes to naming , this principle is not always compatible with the traditional rank-based nomenclature (in which only taxa associated with a rank can be named) because not enough ranks exist to name a long series of nested clades. For these and other reasons, phylogenetic nomenclature has been developed; it
120-446: A branch of mammals that split off after the end of the period when the clade Dinosauria stopped being the dominant terrestrial vertebrates 66 million years ago. The original population and all its descendants are a clade. The rodent clade corresponds to the order Rodentia, and insects to the class Insecta. These clades include smaller clades, such as chipmunk or ant , each of which consists of even smaller clades. The clade "rodent"
150-623: A clade can be described based on two different reference points, crown age and stem age. The crown age of a clade refers to the age of the most recent common ancestor of all of the species in the clade. The stem age of a clade refers to the time that the ancestral lineage of the clade diverged from its sister clade. A clade's stem age is either the same as or older than its crown age. Ages of clades cannot be directly observed. They are inferred, either from stratigraphy of fossils , or from molecular clock estimates. Viruses , and particularly RNA viruses form clades. These are useful in tracking
180-469: A corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy. Proteins in a family descend from a common ancestor and typically have similar three-dimensional structures , functions, and significant sequence similarity . Sequence similarity (usually amino-acid sequence) is one of the most common indicators of homology, or common evolutionary ancestry. Some frameworks for evaluating
210-464: A hierarchical terminology is in use. At the highest level of classification are protein superfamilies , which group distantly related proteins, often based on their structural similarity. Next are protein families, which refer to proteins with a shared evolutionary origin exhibited by significant sequence similarity . Subfamilies can be defined within families to denote closely related proteins that have similar or identical functions. For example,
240-434: A large scale are based on a notion of similarity. Many biological databases catalog protein families and allow users to match query sequences to known families. These include: Similarly, many database-searching algorithms exist, for example: Clades In biological phylogenetics , a clade (from Ancient Greek κλάδος (kládos) 'branch'), also known as a monophyletic group or natural group ,
270-635: A large surface with constraints on the hydrophobicity or polarity of the amino-acid residues. Functionally constrained regions of proteins evolve more slowly than unconstrained regions such as surface loops, giving rise to blocks of conserved sequence when the sequences of a protein family are compared (see multiple sequence alignment ). These blocks are most commonly referred to as motifs, although many other terms are used (blocks, signatures, fingerprints, etc.). Several online resources are devoted to identifying and cataloging protein motifs. According to current consensus, protein families arise in two ways. First,
300-407: A protein have evolved independently. This has led to a focus on families of protein domains. Several online resources are devoted to identifying and cataloging these domains. Different regions of a protein have differing functional constraints. For example, the active site of an enzyme requires certain amino-acid residues to be precisely oriented. A protein–protein binding interface may consist of
330-422: A revised taxonomy based on a concept strongly resembling clades, although the term clade itself would not be coined until 1957 by his grandson, Julian Huxley . German biologist Emil Hans Willi Hennig (1913–1976) is considered to be the founder of cladistics . He proposed a classification system that represented repeated branchings of the family tree, as opposed to the previous systems, which put organisms on
SECTION 10
#1733085427925360-429: A suffix added should be e.g. "dracohortian". A clade is by definition monophyletic , meaning that it contains one ancestor which can be an organism, a population, or a species and all its descendants. The ancestor can be known or unknown; any and all members of a clade can be extant or extinct. The science that tries to reconstruct phylogenetic trees and thus discover clades is called phylogenetics or cladistics ,
390-552: A superfamily like the PA clan of proteases has less sequence conservation than the C04 family within it. Protein families were first recognised when most proteins that were structurally understood were small, single-domain proteins such as myoglobin , hemoglobin , and cytochrome c . Since then, many proteins have been found with multiple independent structural and functional units called domains . Due to evolutionary shuffling, different domains in
420-537: Is a grouping of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree . In the taxonomical literature, sometimes the Latin form cladus (plural cladi ) is used rather than the English form. Clades are the fundamental unit of cladistics , a modern approach to taxonomy adopted by most biological fields. The common ancestor may be an individual,
450-422: Is critical to phylogenetic analysis, functional annotation, and the exploration of the diversity of protein function in a given phylogenetic branch. The Enzyme Function Initiative uses protein families and superfamilies as the basis for development of a sequence/structure-based strategy for large scale functional assignment of enzymes of unknown function. The algorithmic means for establishing protein families on
480-471: Is in turn included in the mammal, vertebrate and animal clades. The idea of a clade did not exist in pre- Darwinian Linnaean taxonomy , which was based by necessity only on internal or external morphological similarities between organisms. Many of the better known animal groups in Linnaeus's original Systema Naturae (mostly vertebrate groups) do represent clades. The phenomenon of convergent evolution
510-471: Is no identifiable sequence homology. Currently, over 60,000 protein families have been defined, although ambiguity in the definition of "protein family" leads different researchers to highly varying numbers. The term protein family has broad usage and can be applied to large groups of proteins with barely detectable sequence similarity as well as narrow groups of proteins with near identical sequence, function, and structure. To distinguish between these cases,
540-515: Is responsible for many cases of misleading similarities in the morphology of groups that evolved from different lineages. With the increasing realization in the first half of the 19th century that species had changed and split through the ages, classification increasingly came to be seen as branches on the evolutionary tree of life . The publication of Darwin's theory of evolution in 1859 gave this view increasing weight. In 1876 Thomas Henry Huxley , an early advocate of evolutionary theory, proposed
570-489: Is still controversial. As an example, see the full current classification of Anas platyrhynchos (the mallard duck) with 40 clades from Eukaryota down by following this Wikispecies link and clicking on "Expand". The name of a clade is conventionally a plural, where the singular refers to each member individually. A unique exception is the reptile clade Dracohors , which was made by haplology from Latin "draco" and "cohors", i.e. "the dragon cohort "; its form with
600-635: Is synthesized on the rough endoplasmic reticulum as a preproenzyme of 339 amino acids with a signal peptide of 17 amino acids. Procathepsin B of 43/46 kDa is then transported to the Golgi apparatus , where cathepsin B is formed. Mature cathepsin B is composed of a heavy chain of 25-26 kDa and a light chain of 5kDa, which are linked by a dimer of disulfide. Cathepsin B may enhance the activity of other proteases, including matrix metalloproteinase , urokinase ( serine protease urokinase plasminogen activator), and cathepsin D , and thus it has an essential position for
630-408: Is upregulated in certain cancers, in pre-malignant lesions, and in various other pathological conditions. The CTSB gene is located at chromosome 8p 22, consisting of 13 exons . The promoter of CTSB gene contains a GC-rich region including many SP1 sites, which is similar to housekeeping genes . At least five transcript variants encoding the same protein have been found for this gene. Cathepsin B
SECTION 20
#1733085427925660-438: The pathogenesis of pancreatitis , by prematurely activating the digestive enzyme trypsinogen within the pancreas, leading to autodigestion of acinar cells . Cathepsin B has been shown to interact with: Cathepsin B is inhibited by: Protein family A protein family is a group of evolutionarily related proteins . In many cases, a protein family has a corresponding gene family , in which each gene encodes
690-442: The duplicated gene is free to diverge and may acquire new functions (by random mutation). Certain gene/protein families, especially in eukaryotes , undergo extreme expansions and contractions in the course of evolution, sometimes in concert with whole genome duplications . Expansions are less likely, and losses more likely, for intrinsically disordered proteins and for protein domains whose hydrophobic amino acids are further from
720-518: The latter term coined by Ernst Mayr (1965), derived from "clade". The results of phylogenetic/cladistic analyses are tree-shaped diagrams called cladograms ; they, and all their branches, are phylogenetic hypotheses. Three methods of defining clades are featured in phylogenetic nomenclature : node-, stem-, and apomorphy-based (see Phylogenetic nomenclature§Phylogenetic definitions of clade names for detailed definitions). The relationship between clades can be described in several ways: The age of
750-477: The optimal degree of dispersion along the primary sequence. This expansion and contraction of protein families is one of the salient features of genome evolution , but its importance and ramifications are currently unclear. As the total number of sequenced proteins increases and interest expands in proteome analysis, an effort is ongoing to organize proteins into families and to describe their component domains and motifs. Reliable identification of protein families
780-425: The proteolysis of extracellular matrix components, intercellular communication disruption, and reduced protease inhibitor expression. Cells may become carcinogenic when cathepsin B is unregulated. Cathepsin B has been proposed as a potentially effective biomarker for a variety of cancers . Overexpression of cathepsin B is correlated with invasive and metastatic cancers. Cathepsin B has been shown to be involved in
810-482: The relationships between organisms that the molecular biology arm of cladistics has revealed include that fungi are closer relatives to animals than they are to plants, archaea are now considered different from bacteria , and multicellular organisms may have evolved from archaea. The term "clade" is also used with a similar meaning in other fields besides biology, such as historical linguistics ; see Cladistics § In disciplines other than biology . The term "clade"
840-422: The separation of a parent species into two genetically isolated descendant species allows a gene/protein to independently accumulate variations ( mutations ) in these two lineages. This results in a family of orthologous proteins, usually with conserved sequence motifs. Second, a gene duplication may create a second copy of a gene (termed a paralog ). Because the original gene is still able to perform its function,
870-418: The significance of similarity between sequences use sequence alignment methods. Proteins that do not share a common ancestor are unlikely to show statistically significant sequence similarity, making sequence alignment a powerful tool for identifying the members of protein families. Families are sometimes grouped together into larger clades called superfamilies based on structural similarity, even if there
900-423: Was coined in 1957 by the biologist Julian Huxley to refer to the result of cladogenesis , the evolutionary splitting of a parent species into two distinct species, a concept Huxley borrowed from Bernhard Rensch . Many commonly named groups – rodents and insects , for example – are clades because, in each case, the group consists of a common ancestor with all its descendant branches. Rodents, for example, are
#924075