Bioinformatics Primer
An Introduction to the PANTHER Pathway Resource
doi:10.1038/pid.2009.1
Standfirst
PANTHER Pathway (http://www.pantherdb.org/pathway/), a module of the PANTHER Protein Classification System, was designed to model evolutionary sequence–function relationships on a large scale. The PANTHER Pathway database has several aims:
1. To comprehensively represent authored, "review-level" biological knowledge concerning both metabolic and signaling pathways, capturing mechanistic detail about each pathway step and literature evidence whenever possible
2. To provide clear visualization of pathways geared to biologist end users and genomic data-analysis tools that employ visualization
3. To provide structured pathway data for systems biology and bioinformatics analysis
4. To represent the evolution of each pathway, thus enabling accurate inferences across homologous genes and pathways in different model organisms
Currently, PANTHER Pathway contains 165 expert-curated metabolic and signaling pathways, comprising 2985 reactions, 20,851 proteins directly associated to the pathways with an evidence code, and 3569 distinct literature references. All pathways were created with CellDesigner, a pathway editing tool that captures data in Systems Biology Markup Language (SBML) format and displays the pathway diagram in Systems Biology Graphical Notation (SBGN). Finally, PANTHER Pathway is supported by a number of PANTHER web tools, including the protein classification tool and gene expression analysis tool.
Introduction
PANTHER (Protein ANalysis THrough Evolutionary Relationships) is a publicly available, comprehensive software system for relating protein sequence to the evolution of specific protein functions and biological roles (Mi et al. 2007). The core of the system is a large collection of phylogenetically defined protein families and subfamilies generated by computational algorithms and curated by expert biologists using an extensive software system for associating ontology terms with phylogenetically defined groups ("subfamilies") of protein-coding genes (Thomas et al. 2003). Each protein family or subfamily is composed of a cluster of evolutionarily related protein sequences and is represented by a phylogenetic tree (Figure 1a and 1b), a hidden Markov model (HMM) for classification of newly discovered protein sequences, and a multiple sequence alignment (MSA) (Figure 1c).
Figure 1a
Collapsed phylogenetic tree of a PANTHER protein family. PANTHER protein family PTHR19957, the Syntaxin family, is shown in the PANTHER TreeViewer. The phylogenetic tree was constructed computationally from sequence data. The collapsed tree shows the subfamilies as leaf nodes (blue diamond nodes). Subfamily nodes correspond to common ancestors of extant family members and are annotated, on the basis of experiments performed on extant proteins by expert biologists, with their inferred molecular functions and roles in biological processes and pathways. Each subfamily is represented by a hidden Markov model (HMM) to allow classification of newly discovered protein sequences.
Figure 1b
Two subfamilies of the phylogenetic tree for the Syntaxin family—identified by the red arrow in (A)—are expanded to show Syntaxin 11 (SF30) and Syntaxin 4 (SF32) subfamily members, each spanning sequences from different vertebrates (zebrafish, mammals). Subfamilies can be expanded and collapsed by clicking on a node of the tree.
Figure 1c
Multiple sequence alignment of a PANTHER protein family. Clicking on the "MSA" button shows the multiple sequence alignment (MSA) of the Syntaxin 11 and Syntaxin 4 subfamily member sequences, which is also used to build an HMM. Note that the MSA shows differences between amino acids conserved in different subfamilies (red arrows).
PANTHER Pathway is one of the modules within PANTHER (Mi et al. 2007) (Figure 2). A "pathway" consists of biochemical reactions that occur in a particular sequence to carry out a biological "program," generally at the cellular level. PANTHER Pathway includes both metabolic pathways (chemical transformations of small molecules) and signaling pathways (transduction of one or more signals that results in a specific cellular response). All pathways are curated by experts using the PANTHER Pathway Curation Software Module (http://curation.pantherdb.org/). They are represented by a pathway ontology stored in SBML format (Hucka et al. 2003) and displayed in diagrams with the PANTHER Pathway applet, which is based on the CellDesigner (http://www.celldesigner.org) pathway editing tool (Kitano et al. 2005).
Figure 2
PANTHER pathway module and its relationship to the PANTHER family/subfamily library module. A pathway molecule and molecular complex classes participate in a reaction, and a reaction is part of a pathway. Both pathways and molecule as well as complex classes are annotated with locations, either a cell type or cellular component. Since a reaction can span different cellular components (e.g., a transcription factor translocating from the cytosol to nucleus), the reaction itself is not annotated with a location. The pathway is linked to the PANTHER family/subfamily library module through the association of each pathway molecule class with one or more sequences in the library. Each sequence belongs to a subfamily, which indirectly links a pathway to a hidden Markov model (HMM), phylogenetic tree, and multiple sequence alignment (MSA).
Data model
The PANTHER Pathway ontology uses controlled vocabulary to describe pathways, their components, and the relationships among them. The PANTHER Pathway ontology has four major classes of data: pathway, molecule, reaction, and location (cell type/cellular compartment).
Pathway class
Each pathway class represents the concept and scope of a pathway. The scope of the pathway is similar to those documented in textbooks or review articles. It is usually very well defined in metabolic pathways but much less so in signaling (or regulatory) pathways. For example, the MAP kinase signaling cascade has been referred to as a pathway on many occasions, but it is also a signaling module in some specific pathways, such as the apoptosis signaling pathway, angiogenesis pathway, or p53 pathway. A pathway class is associated with the following attributes:
- Pathway name. This is usually the name commonly referred to by biologists in the field.
- Definition. This is a text description of the pathway.
- References. If a pathway is sufficiently well established to appear in a textbook, a textbook reference is usually sufficient. Otherwise we require at least three references to support the overall structure and boundaries of the pathway.
Molecule class
A molecule class represents a specific class of molecules that play the same mechanistic role within a pathway. For example, in the MAP kinase signaling cascade, there is a class of proteins called MAPK. There are five molecule subclasses: proteins, genes/DNA, RNA, simple molecules (small organic, inorganic, or synthetic molecule), and ions. If a molecule is a protein, gene, or transcribed RNA, it is associated with protein sequences in the PANTHER protein family trees by manual curation (see below). The individual protein sequences are instances of the molecule class. In these cases, a molecule class is typically a group of orthologous and possibly also paralogous proteins that participate in the same specific biochemical reactions within the pathway. Each molecule class is also associated with the following attributes:
- Name. The name that appears on the pathway diagram. It is usually an acronym or a short version of the full name (e.g., MAPK).
- Full name. The complete, more descriptive version of the name (e.g., mitogen-activated protein kinase).
- Synonyms. All other names used to describe the molecule class (e.g., MAP kinase).
- Definition. A short description of the molecule class.
- Reference. Literature references, usually OMIM entries or review articles, are captured at this level to support the involvement of the molecule class in the pathway. However, this is not a requirement. More references are captured when sequences are associated to the molecule class.
Linking a molecule class to protein sequences in the PANTHER Protein Library:
As mentioned earlier, each PANTHER pathway molecule class represents a class of molecules that play the same mechanistic role within a pathway. This means that multiple proteins from the same organism or different organisms can potentially play the same given role in a pathway. The advantage is that a more complete pathway diagram for one organism can be constructed by also including experimental data accumulated from other organisms.
Protein sequence identifiers, primarily from the UniProt database (Wu et al. 2006), are associated to molecule classes using the PANTHER curation software (http://curation.pantherdb.org/) (Figure 2). This software system allows the curators to search for sequences based on the molecule class term and manually associate the sequences to the ontology term (Figure 3). For each annotation, Gene Ontology (GO) evidence code(s) (http://www.geneontology.org/GO.evidence.shtml) and literature reference PubMed identifier(s) must be selected as evidence (see below). Curators are allowed to associate orthologous or even paralogous sequences to the molecule class without experimental evidence, using the ISS (inferred by sequence similarity) evidence code. For example, the p38-MAPK molecule class is associated with the following species-specific UniProt identifiers: MK14_CANFA, MK14_HUMAN, MK14_MOUSE, MK14_PANTR, MK14_RAT, MK14_XENLA.
Figure 3a
Associating pathway molecule classes to subfamilies in a phylogenetic tree.
Molecule class of GABA-B receptor (in yellow) in the GABA-B receptor II signaling pathway (PANTHER accession P05731).
Figure 3b
Associating pathway molecule classes to subfamilies in a phylogenetic tree.
The phylogenetic tree of GABA-B receptor. Subfamily 4 is the GABA-B receptor 2 (PANTHER accession PTHR10519:SF4), which is composed of sequences from human (red arrow), rat, insects, and worm.
Reaction class and relationships
The reaction class represents biochemical relationships among various molecule classes. A typical reaction is a chemical transformation, either covalent or noncovalent. It has inputs (molecules that will be transformed during the reaction), outputs (the transformed molecules), and controllers (molecules that affect the reaction). Transformation includes state transition, transport, complex formation/dissociation; and control includes catalysis, modulation, stimulation, inhibition, transcriptional activation/inhibition, and translational activation/inhibition.
Based on the reactions, we derive relationships among various molecule classes. For example, if a kinase catalyzes a transition of a protein from a nonphosphorylated state to a phosphorylated state, the kinase is upstream_of, and phosphorylates the protein. The more common relationships include upstream_of, downstream_of, phosphorylates, dephosphorylates, acetylates, ubiquitinates, and methylates.
Location class (cell type or subcellular compartment)
This class describes the location(s) where the reaction occurs and where the participating molecules are located when the reaction occurs. Each molecule class and reaction class is generally associated with one or more cell types or subcellular compartments. Currently, the cell type or component is free text entered by the curator, but we are in the process of enforcing the use of cellular component ontology terms from the Gene Ontology.
All pathways in PANTHER can be downloaded in Systems Biology Markup Language (SBML) format (Hucka et al. 2003) from the PANTHER downloads page (http://www.pantherdb.org/downloads). The file containing all associations of sequence identifiers with pathway molecule classes is also available. In addition, each pathway can be downloaded individually from the CellDesigner applet viewer (see below).
Browsing and search
There are three ways to browse and search PANTHER Pathway:
1. Text search. Type any search term in the search box on a PANTHER web page. You can either select "pathway" in the drop-down menu next to the box (Figure 4a) or just leave it as "all." The search term can be a gene name or symbol, a protein name, or a pathway term. If "pathway" was selected, a list of pathways that contain the search term in the pathway class (or one of its associated molecule classes) will be returned. If "all" was selected, you will first get an overview of the different data types in PANTHER that match the search term, including pathways, genes, proteins, and ontology terms.
Figure 4a
PANTHER Pathway home page text search. On the home page of PANTHER website, type the search term in the search box (blue arrow), and select "pathway" in the drop-down menu (red arrow). A list of pathways that contain the search term in the pathway class (or one of its associated molecule classes) will be returned.
2. Browse pathways using the PANTHER Prowler. Go to the Prowler page (http://www.pantherdb.org/panther/prowler.jsp) by clicking the "Browse" tab on the PANTHER home page. You can browse all PANTHER pathways together with other ontologies, such as molecular function and biological process (Figure 4b). Simply clicking the "See details" icon next to a pathway name will take you to the pathway diagram.
Figure 4b
Browse pathways using the PANTHER Prowler. Go to the PANTHER Prowler (http://www.pantherdb.org/panther/prowler.jsp) by clicking the "Browse" tab on the PANTHER home page. You can browse all PANTHER pathways together with other ontologies, such as molecular function and biological process. Simply clicking the "See Details" button will take you to the pathway diagram.
3. Search by sequence. You can score a protein sequence against the PANTHER HMM library (http://www.pantherdb.org/tools/hmmScoreForm.jsp) (Figure 4c). You can go to the PANTHER Subfamily Information page from the score result page to get more detailed description—including associated pathways—about the HMM model that matches your sequence (you can reach this page by clicking the Subfamily ID or Subfamily Name on the result page). Currently, 2549 HMMs are associated with at least one pathway.
Figure 4c
Search by sequence. You can score a protein sequence against the PANTHER HMM library (http://www.pantherdb.org/tools/hmmScoreForm.jsp).
Pathway visualization and data analysis
Pathway applet
The PANTHER Pathway applet, written in Java, displays PANTHER pathways on the PANTHER website. It was developed based on the CellDesigner (Kitano et al. 2005) version 4.0 source code, with additional functionality to enhance its pathway visualization capabilities. Visualization features such as zooming, coloring, and exporting the image are available from the menu pull-down tabs at the top left. A diagram legend can be found at the bottom of the page. Pre-generated diagrams and SBML files can be exported using the "Export" drop-down menu at the top of the page. The applet is also interactive, so all notations in the diagram can be adjusted and the resulting diagram can also be exported using the "Export Image" function under File. The PANTHER Pathway applet can read and display XML files created by CellDesigner 4.0, pathway editing software, which creates pathway diagrams compatible with SBGN Process Diagram Level 1 (http://sbgn.svn.sourceforge.net/viewvc/sbgn/ProcessDiagram/tags/Level1.0.0/sbgn_PD-level1.pdf). Therefore the applet can display detailed molecular events and biochemical reactions of the pathways. The CellDesigner "process notation" (Kitano et al. 2005) for metabolic pathways is similar to that illustrated in most biochemistry textbooks. For signaling pathways, however, it is quite different from the conventional diagrams found in textbooks and review articles.
Conventionally, biologists are used to the "activity flow" diagram, in which, instead of detailed biochemical reactions, simple relationships such as activation or inhibition are used to illustrate the activity flow of a signaling pathway. To provide a user interface that is more familiar to biologists, the applet can display a pathway as an Activity Flow (AF, or Lite View, Figure 5b). All pathways are created as a Process Diagram (PD, or Standard View, Figure 5a) using CellDesigner, and the conversion to AF is done computationally based on a set of rules defined in the applet. Thus the conversion simplifies the diagram while still preserving information about the mechanism of each pathway step. The users can easily toggle between the two views by clicking on the "Standard View" and "Lite View" tabs above the diagram (Figure 6).
Figure 5a
Process diagram (PD) view of a pathway. The Insulin-MAPK pathway (PANTHER accession P00032) is shown in the standard CellDesigner or PD view in PANTHER Pathway applet.
Figure 5b
Activity flow (AF) view of a pathway. The Insulin-MAPK pathway is converted automatically to a Lite or AF diagram by the PANTHER Pathway applet.
By right-clicking a particular protein, you can select a list of pathway components with the following four options (Figure 6): Select upstream (Figure 6a). The component(s) immediately upstream of your protein. They are usually the protein(s) that catalyze/modify/regulate the reaction in which your protein is involved and often have direct physical interactions with your protein. Select upstream path (Figure 6b). All proteins upstream of your protein. These proteins may not have direct interactions with your protein, but any perturbations of these upstream proteins could potentially affect the function of your protein. Select downstream. The protein(s) directly downstream of your protein. Your protein usually catalyzes/modifies/regulates the reaction(s) in which these proteins are involved. Therefore it is often true that your protein has direct interaction with these proteins or catalyzes a reaction whose output is used as an input into the downstream reaction. Select downstream path. All proteins downstream of your protein of interest. These proteins may not interact directly with your protein, but any change of function of your protein may affect the function of these proteins.
Figure 6a
Retrieve immediately upstream protein relationships captured in the pathways. The PANTHER Pathway applet utilizes relationships captured by the pathway and allows users to quickly and accurately retrieve the pathway component(s) immediately upstream of a protein by right-clicking a component and choosing "Select upstream".
Figure 6b
Retrieve all pathway components upstream of a protein. Retrieve all pathway components upstream of a protein of interest by choosing "Select upstream path." The selected components are highlighted in yellow, and their gene, transcript, protein, or PANTHER family information can be retrieved.
Since all PANTHER pathways are linked to the PANTHER Classification System, users can retrieve—for the list of proteins—genomic information, protein/transcript information, PANTHER ontology information, and so on. They can also navigate to external databases such as Entrez Gene (Maglott et al. 2005).
Pathway analysis tools
The PANTHER website has two integrated tools for analyzing data in the context of PANTHER pathways (http://www.pantherdb.org/tools/genexAnalysis.jsp; Figure 7a) (Thomas et al. 2006). The first tool, Compare Gene List, is based on the simple binomial test described by Cho and Campbell (2000). In this tool, as many as four test lists and an optional reference list (default reference lists are available on the website) are divided into groups based on PANTHER classification (either molecular function, biological process, or pathway), and the binomial test is applied to determine whether there is a statistically significant over- or underrepresentation of genes/proteins in each test list relative to the reference list. One main use of the statistical test is to prioritize pathways that contain an abundance of genes in the uploaded list. From the statistical test results page, users can click on the pathway name to see proteins and genes highlighted according to which list or lists they appear in (Figure 7b). The second tool is the Analyze List with Gene Expression Value. This tool is for analysis of a complete list of genes/proteins that have numerical expression data associated with each gene/protein. The tool can be used with any numerical data associated with a gene or protein. The type of data familiar to most users is gene expression data, such as the fold–change value for each gene in a differential expression experiment. A reference distribution is generated by the statistical tool using all values for all input data in the list; then distributions for each functional category are generated. The probability that the functional category distribution was drawn randomly from the reference distribution is estimated using the Mann–Whitney Rank–Sum Test (U-Test) (Clark et al. 2003). The data can be visualized for each pathway by clicking on the pathway name from the analysis results page. In this case, each protein or gene in the pathway is colored according to a "heat map" derived from the input values (Figure 7d).
Figure 7a
PANTHER Gene Expression Analysis tools. PANTHER provides two tools for analyzing gene expression data: "Compare gene lists tool" and "Analyze a list of genes" with expression values (Mann-Whitney test).
Figure 7b
The results from the Gene Expression Analysis Compare gene lists tool. The results of two sample gene lists were displayed in the p38 MAPK pathway (PANTHER accession P05918) using the PANTHER pathway applet. Components are shown in red and green if they are in one of the lists, in yellow if they are in both lists, and in gray if they are in neither list.
Figure 7c
The results page from the Gene Expression Analysis Mann-Whitney test. By default, pathways are sorted by the Bonferroni-corrected P value comparing the distribution of values for the genes in the pathway to the overall distribution, with "+" indicating a shift toward higher values than overall and "-" indicating lower values. Clicking on the pathway name launches the pathway applet, as shown in Figure 7d.
Figure 7d
Gene Expression Analysis Mann-Whitney test results are visualized in a pathway diagram of the p53 pathway (PANTHER accession P00059) using the PANTHER pathway applet that colors the pathway using a "heat map" derived from the input values. By default, the input values are divided into six quantiles (six bins, each with an equal number of data points); all genes in the lowest quantile are colored dark blue, with progressively warmer colors for higher-valued quantiles through to the highest quantile, which is colored red. The legend of the heat map can be found by clicking on the "Specify color ranges for pathway diagrams" button on the analysis result page as shown in Figure 7c, which also allows the user to customize the colors according to input value thresholds.
Community curation
The scientific community is encouraged to contribute to the PANTHER Pathway resource. Experts can curate remotely using the CellDesigner tools and PANTHER community curation website. Support for curation training and review of pathways is provided by the PANTHER Pathway team. Details can be found at http://curation.pantherdb.org. All authorship is attributed on the website, and pathways are made freely available.
Conclusion
PANTHER Pathway is one of the modules within PANTHER; it includes both metabolic pathways and signaling pathways. All pathways are curated by experts using the PANTHER Pathway Curation Software Module. They are represented by a pathway ontology stored in SBML format and displayed in diagrams with the PANTHER Pathway applet, which is based on the CellDesigner pathway editing tool. PANTHER Pathway is freely available to all users.
Acknowledgments
This work is supported by a grant from the U.S. National Institutes of Health (R01 GM081084).
Correspondence should be addressed to:
Evolutionary Systems Biology Group, SRI International
333 Ravenswood Ave.
Menlo Park, CA, USA
References
- Cho, R. J. & Campbell, M. J. Transcription, genomes, function. Trends Genet. 16, 409–415 (2000) | Article | PubMed | ISI | ChemPort |
- Clark, A. G. et al. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302, 1960–1963 (2003) | Article | PubMed | ISI | ChemPort |
- Gene Ontology Consortium The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 34, D322–D326 (2006) | Article | PubMed | ChemPort |
- Hucka, M. et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, (4) 524–531 (2003)
- Kitano, H., Funahashi, A., Matsuoka, Y. & Oda, K. Using process diagrams for the graphical representation of biological networks. Nat. Biotechnol. 23, 961–966 (2005) | Article | PubMed | ISI | ChemPort |
- Maglott, D., Ostell, J., Pruitt, K.D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 33, D54–D58 (2005) | Article | PubMed | ISI | ChemPort |
- Mi, H., Guo, N., Kejariwal, A. & Thomas, P.D. PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res. 35, D247–D252 (2007) | Article | PubMed | ISI | ChemPort |
- Thomas, P.D. et al. IPANTHER: a library of protein families and subfamilies indexed by function. Genome Res 13, 2129–2141 (2003) | Article | PubMed | ISI | ChemPort |
- Thomas, P.D. et al. Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Res. 34, W645–W650 (2006) | Article | PubMed |
- Wu, C.H. et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34, D187–D191 (2006) | Article | PubMed | ISI | ChemPort |




