Content | |
---|---|
Description | The PANTHER database classifies gene products into families |
Data types captured | Gene families |
Contact | |
Research centre | University of Southern California |
Author(s) | Paul D Thomas |
Primary citation | PMID 12520017 |
Access | |
Website | [1] |
Miscellaneous | |
Bookmarkable entities | yes |
In bioinformatics, the PANTHER (protein analysis through evolutionary relationships) classification system is a large curated biological database of gene/protein families and their functionally related subfamilies that can be used to classify and identify the function of gene products.[1] PANTHER is part of the Gene Ontology Reference Genome Project[2] designed to classify proteins and their genes for high-throughput analysis.
The project consists of both manual curation and bioinformatics algorithms.[3] Proteins are classified according to family (and subfamily), molecular function, biological process and pathway. It is one of the databases feeding into the European Bioinformatics Institute's InterPro database.[4]—Application of PANTHER—The most important application of PANTHER is to accurately infer the function of uncharacterized genes from any organism based on their evolutionary relationships to genes with known functions.[3] By combining gene function, ontology, pathways and statistical analysis tools, PANTHER enables biologists to analyze large-scale, genome-wide data obtained from the current advance technology including: sequencing, proteomics or gene expression experiments.[5] Shortly, using the data and tools on the PANTHER, users will be able to:[6]
In PANTHER there is a phylogenetic tree for each of the protein families. The annotation of tree is done based on the following criteria:
To generate phylogenetic trees, PANTHER uses GIGA algorithm. GIGA uses species tree to develop tree construction. On every iteration it attempts to reconcile tree in event form of speciation and gene duplication.
The process for data generation is divided into three steps:
PANTHER trees depicts gene family evolution from a broad selection of genomes which are fully sequenced. PANTHER have one sequence per gene so that the tree can represent event occurred over the course of evolution i.e duplication, speciation. PANTHER genomes set are selected based on the following criteria:
Following are the requirements for being family clusters in PANTHER:
For each family multiple sequence are aligned using a default setting of MAFFT, any column which is aligned less than 75% of the sequence is removed. This data is then used as an input for GIGA program. The output tree from GIGA are labelled. Each internal node is labelled as whether divergence event happened as speciation or gene duplication.
Each node in PANTHER tree is annotated with heritable attribute. Heritable attribute can be of three types subfamily membership, gene function and protein class membership. These annotation of nodes applies to primary sequence which was used to construct tree. In applying these annotation to primary sequence simple evolutionary principle is used i.e. each node annotation is propagated by its decedent node.[3]
PANTHER/LIB (PANTHER library): Library consists of collection of books. Each of these books represents a protein family. There are a Hidden Markov Model (HMM), a multiple sequence alignment (MSA) and a family tree for each protein family in the library.[1]
PANTHER/X (PANTEHR index): Index contains abbreviated ontology which assist in summarizing, navigating molecular function and biological function. Although PANTHER/X ontology has a hierarchical organization, it is a directed acyclic graph and so when it is biologically justified, child categories appear under more than one parent. PANTHER/X has been mapped to GO and arranged in a different way to facilitate large scale analysis of proteins.[1]
PANTHER includes 176 pathway using CellDesigner tool. PANTHER pathways can be downloaded in the following file formats.
Version 6 uses UniProt[11] sequences as training sequences. There are 19132 UniProt training sequences directly associated with the pathway components. This version has ~1500 reactions in 130 pathways, and the number of pathways associated with subfamilies were expanded. PANTHER became a member of the InterPro Consortium. The availability of PANTHER data was improved (the HMMs can be downloaded by FTP). The PANTHER/LIB version 6.1 contains 221609 UniProt sequences from 53 organisms, grouped into 5546 families and 24561 subfamilies.[12] (2006)
In this version the phylogenetic trees represent speciation and gene duplication events. Identification of gene orthologs is possible. There are more support for alternative database identifiers for genes, proteins and microarray probes. PANTHER version 7 uses the SBGN standard to depict biological pathways. It includes 48 set of genomes. To define the new families and in collaboration with the European Bioinformatics Institute’s InterPro group,[4] approximately 1000 families of non-animal genomes were added in this version. The sources of gene sets included model organism databases, Ensembl[13] genome annotation and Entrez Gene.[14] Since this version, a stable identifier to each node in the tree is used. This stable identifier is a nine-digit number with the prefix PTN (stand for PANTHER Tree Node).[3][15] (2009)
The reference proteome[16] set maintained by the UniProt resource is used in this version of PANTHER and so the source of gene sets is UniProt. It includes 82 set of genomes (approximately double compared with version 7) and 991985 protein coding genes from which 642319 genes (64.75%) have been used for family clusters. PANTHER website is redesigned to facilitate common user workflow.[3]
This version contains 7180 protein families, divided into 52,768 functionally distinct protein subfamilies. Version 9.0 has genomes of all 85 organisms.[17][6]
This version contains 78442 subfamilies and 1,064,054 genes annotated.
The home page of PANTHER website shows several folder tabs for major workflows, including: gene list analysis, browse, sequence search, cSNP scoring, and keyword search. The details about each of these workflow are provided below.
This tab is selected by default because this the most frequently used option. You can enter valid IDs in the box or upload a file, then select list type, choose organism of interest and select the type of analysis.
A practical example: Let's try this workflow using an example of a small gene list containing three genes AKT1, AKT2, AKT3. We first type these gene names within the box and separate them by comma (or space). We select "ID list" as list type, "Homo Sapiens" (human) as organism, and " Functional classification viewed in gene list" as the type of operation; then click submit. It gives you the information for all the three genes which are:
Using this folder tab and by selecting the ontology you are interested in, you can browse different classification. It is also possible to select more than one ontology; in this case, the results will meet the criteria from all the selections. You are able to see the association between ontology terms and PANTHER families, subfamilies and training sequences.
By putting the protein sequence in the Sequence Search box, PANTHER will search against a library of family and subfamily HMMs, and return the subfamily that best matches the sequence. If you click on the subfamily name, it will give some details, e.g. the genes related to that subfamily and the ability to view the subfamily within larger family tree. By downloading the PANTHER scoring tool from download page, you will be able to score many sequences against PANTHER HMMs.
Using this folder tab, you are able to do evolution analysis of coding SNPs. You must enter a protein sequence in the first box and the substitutions relative to this protein sequence in the second box; this substitutions should be entered in the standard amino acid substitution format, e.g. L46P. PANTHER will use an alignment of evolutionarily related proteins, calculate the substitution position-specific evolutionary conservation (subPSEC) and estimate the likelihood of this nonsynonymous coding SNP to lead a functional effect on the protein. This tool uses data from PANTHER version 6.1 for technical reasons. One of the new features of PANTHER is that if you want to analyze a lot of SNPs, you can go to the download page and download the PANTHER Coding Snp Analysis tool.
Entering a search term in the keyword search box, PANTHER will give you the number of records matching your keyword for genes, families, pathways and ontology terms. You can filter them by determining the species of interest or by refining the search using other criteria. To view the details of the gene, you must click on the gene identifier.