Skip Navigation

Pathway Interaction Database homepage

Data Representation

PID's flexible database schema stores a wide variety of information about cell signaling pathways. PID captures information at the level of the molecular interaction in a fine-grained, highly structured and unambiguous format, making use of existing standards for gene and protein sequences, biomolecule and biological process annotation.

The basic unit of information in PID is the molecular interaction. For each interaction, the database minimally identifies the biomolecule(s) involved (protein, RNA, complex, compound), the nature of the processes involving these biomolecules (reaction, binding, translocation, transcription), and the role of each biomolecule (input, output, positive regulator, negative regulator) in those processes. Additionally, activity states and post-translational modifications are assigned.

In each pathway input biomolecules are transformed into output biomolecules, a process that may be aided by a positive regulator and hindered by a negative regulator. The output may then be re-used as the input, positive regulator or negative regulator for subsequent processes, thus creating a pathway of successive interactions.

Pathways are graphically depicted in PID with labeled nodes and edges. Nodes represent either a biomolecule or a process, and edges connect biomolecules and processes. An edge describes the role of each biomolecule (input, output, positive regulator, negative regulator) in the process. An interaction consists of one biological event or process node that is connected by incoming (input, positive regulator, negative regulator) and outgoing (output) edges to its adjacent biomolecule or process nodes:

Biomolecules

Types: Most biomolecules found in PID are human and may be proteins, RNAs, compounds or complexes. A compound is any molecule that is not a protein, RNA or complex. A complex is comprised of proteins, compounds, RNAs, other complexes, and any combination of these.

Naming conventions: Proteins and mRNA are named using either the accepted HUGO gene symbol or an alias listed in Entrez Gene. Compounds, complexes and protein families may be given the common name found in the biomedical literature. Alternately, complexes may be named by the Gene Ontology (GO) Cellular Component term.

Identifiers: Identifiers that unambiguously define a biomolecule are: UniProt sequence identifiers for proteins (except for BioCarta data where proteins are annotated with Entrez Gene identifiers); Entrez Gene identifiers for both genes and proteins; CAS Registry Numbers for compounds; and GO Cellular Component identifiers, when available, for complexes. Complexes are also defined by the set of constituent member names and identifiers.

If protein isoforms exist in multiple UniProt entries then, unless a specific isoform is reported in the reference, the UniProt parent identifier is used to define the protein.

Activity states and post-translational modifications:

Biomolecules perform different functions depending on their activity state, which in PID can be specified as a label designating a functional state ('active', 'active1' 'active2', 'inactive', etc.) and physical covalent post-translational modifications (PTMs) when known. PTMs include phosphorylation, acetylation, biotinylation, farnesylation, geranylgeranylation, glycosaminoglycan, glycosylation, methylation, myristoylation, oxidation, palmitoylation, poly(ADP-ribosyl)ation, sumoylation, and ubiquitination. PTMs may be further annotated by assigning the modified residue (designated by a single-letter amino acid code) and modified residue position (when known). If a PTM or activity state is annotated in an input biomolecule then implicitly it is required for the interaction to occur. Within complexes, component activity states may differ from the activity state of the complex as a whole.

Location and function:

The GO Cellular Component and Molecular Function controlled vocabularies are used to describe a biomolecule's location and function, respectively. Subcellular location terms from the GO Cellular Component controlled vocabulary are required when describing a translocation event. The GO Molecular Function vocabulary is commonly used in PID to specify a biomolecular function more specifically than the general role of input, positive regulator, negative regulator, and output.

Biological events

A range of biological events describe the general nature of each interaction. Furthermore, each biological event can have positive or negative regulators associated with its action.

Binding: includes protein-protein, protein-compound and protein-RNA interactions. Binding also encompasses modifications to biomolecules such as phosphorylation (which is reflected in the PTM(s) of the output biomolecules), complex association/disassociation and biomolecular cleavage to multiple output biomolecules, which, if the information is available, will be annotated by sequence location.

Transcription: Transcription includes transcription from DNA to RNA as well as the steps of RNA export to the cytoplasm and translation.

Translocation: Translocation defines intracellular transport as well as extracellular secretion. The input and output biomolecules must possess different locations as annotated by the GO Cellular Component vocabulary.

Reaction: Reaction includes the covalent interactions that customarily occur in enzymatic processes.

Regardless of the biological event type, an event may possess a condition (such as G1 phase of the mitotic cell cycle), which is a pre-requisite for the interaction to take place. These conditions are defined by GO Molecular Function, GO Biological Process or NCI thesaurus terms.

Biological processes

A biological process is a multi-step event, captured in a single node and often mediated by a biomolecule. Biological processes are annotated with a GO Biological Process term or an NCI thesaurus term.

Role of biomolecules

Each biomolecule plays a role in the interaction process: It can be an input, output, positive regulator or negative regulator. Inputs are transformed into outputs. The defining characteristics of positive and negative regulators is that they are not themselves transformed by the event, but are acting either directly or indirectly on the input. An output biomolecule can form the input, positive regulator or negative regulator in subsequent interactions.

Evidence codes and literature references

Interactions are annotated with information about the source of the evidence and an evidence-type tag, called an evidence code. The source of evidence is annotated using PubMed identifiers. Users can limit their queries on NCI-Nature curated data, when performing an Advanced search, by evidence-type to selectively view and analyze information that corresponds to their interest and confidence in the evidence categories. The list of evidence codes used in PID is partially derived from the GO evidence codes:

Acronym Expansion Description
IAE Inferred from Array Experiments
  • ChIP on chip
  • Protein microarrays/protein chips
  • Chemical compound arrays
IC Inferred by Curator
  • An experimentally-determined interaction, not falling under any of the other evidence codes, but still deemed to occur by the curator. (Not commonly applied.)
IDA Inferred from Direct Assay
  • Enzyme assays
  • In vitro reconstitution
  • Functional and activity assays
IFC Inferred from Functional Complementation
  • A gene from one organism complements a mutation in another species
IGI Inferred from Genetic Interaction
  • Genetic interactions such as suppressors, synthetic lethals, and rescue experiments
  • Inference about one gene drawn from the phenotype of a mutation in a different gene
IMP Inferred from Mutant Phenotype
  • Any gene mutation/knockout
  • Overexpression/ectopic expression of wild-type or mutant genes
  • Anti-sense experiments
  • RNAi experiments
  • Polymorphism or allelic variation
IOS Inferred from Other Species
  • An interaction that is inferred from another species due to a lack of evidence in human.
IPI Inferred from Physical Interaction Any physical interaction detection method, common ones are:
  • 2-hybrid interactions
  • co-purification
  • co-immunoprecipitation
  • ion/protein binding experiments
RCA Inferred from Reviewed Computational Analysis
  • Predictions based on large-scale experiments (e.g. genome-wide two-hybrid, genome-wide synthetic interactions)
  • Predictions based on integration of large-scale datasets of several types
  • Text-based computation (e.g. text mining)
RGE Inferred from Reporter Gene Expression
  • Reporter gene expression studies
TAS Traceable Author Statement
  • Anything found in secondary literature (such as a review article or textbook) where the original experiments are traceable through that piece of literature.

Pathways

A pathway in PID is defined as a biologically meaningful set of interactions; a minimal pathway consists of a single interaction. For a complete list of pathways visit the Browse pathways page.

Extra navigation