PID's flexible database schema stores a wide variety of information about cell signaling pathways. PID captures information at the level of the molecular interaction in a fine-grained, highly structured and unambiguous format, making use of existing standards for gene and protein sequences, biomolecule and biological process annotation.
The basic unit of information in PID is the molecular interaction. For each interaction, the database minimally identifies the biomolecule(s) involved (protein, RNA, complex, compound), the nature of the processes involving these biomolecules (reaction, binding, translocation, transcription), and the role of each biomolecule (input, output, positive regulator, negative regulator) in those processes. Additionally, activity states and post-translational modifications are assigned.
In each pathway input biomolecules are transformed into output biomolecules, a process that may be aided by a positive regulator and hindered by a negative regulator. The output may then be re-used as the input, positive regulator or negative regulator for subsequent processes, thus creating a pathway of successive interactions.
Pathways are graphically depicted in PID with labeled nodes and edges. Nodes represent either a biomolecule or a process, and edges connect biomolecules and processes. An edge describes the role of each biomolecule (input, output, positive regulator, negative regulator) in the process. An interaction consists of one biological event or process node that is connected by incoming (input, positive regulator, negative regulator) and outgoing (output) edges to its adjacent biomolecule or process nodes:
Types: Most biomolecules found in PID are human and may be proteins, RNAs, compounds or complexes. A compound is any molecule that is not a protein, RNA or complex. A complex is comprised of proteins, compounds, RNAs, other complexes, and any combination of these.
Naming conventions: Proteins and mRNA are named using either the accepted HUGO gene symbol or an alias listed in Entrez Gene. Compounds, complexes and protein families may be given the common name found in the biomedical literature. Alternately, complexes may be named by the Gene Ontology (GO) Cellular Component term.
Identifiers: Identifiers that unambiguously define a biomolecule are: UniProt sequence identifiers for proteins (except for BioCarta data where proteins are annotated with Entrez Gene identifiers); Entrez Gene identifiers for both genes and proteins; CAS Registry Numbers for compounds; and GO Cellular Component identifiers, when available, for complexes. Complexes are also defined by the set of constituent member names and identifiers.
If protein isoforms exist in multiple UniProt entries then, unless a specific isoform is reported in the reference, the UniProt parent identifier is used to define the protein.
Activity states and post-translational modifications:
Biomolecules perform different functions depending on their activity state, which in PID can be specified as a label designating a functional state ('active', 'active1' 'active2', 'inactive', etc.) and physical covalent post-translational modifications (PTMs) when known. PTMs include phosphorylation, acetylation, biotinylation, farnesylation, geranylgeranylation, glycosaminoglycan, glycosylation, methylation, myristoylation, oxidation, palmitoylation, poly(ADP-ribosyl)ation, sumoylation, and ubiquitination. PTMs may be further annotated by assigning the modified residue (designated by a single-letter amino acid code) and modified residue position (when known). If a PTM or activity state is annotated in an input biomolecule then implicitly it is required for the interaction to occur. Within complexes, component activity states may differ from the activity state of the complex as a whole.
Location and function:
The GO Cellular Component and Molecular Function controlled vocabularies are used to describe a biomolecule's location and function, respectively. Subcellular location terms from the GO Cellular Component controlled vocabulary are required when describing a translocation event. The GO Molecular Function vocabulary is commonly used in PID to specify a biomolecular function more specifically than the general role of input, positive regulator, negative regulator, and output.
A range of biological events describe the general nature of each interaction. Furthermore, each biological event can have positive or negative regulators associated with its action.
Binding: includes protein-protein, protein-compound and protein-RNA interactions. Binding also encompasses modifications to biomolecules such as phosphorylation (which is reflected in the PTM(s) of the output biomolecules), complex association/disassociation and biomolecular cleavage to multiple output biomolecules, which, if the information is available, will be annotated by sequence location.
Translocation: Translocation defines intracellular transport as well as extracellular secretion. The input and output biomolecules must possess different locations as annotated by the GO Cellular Component vocabulary.
Regardless of the biological event type, an event may possess a condition (such as G1 phase of the mitotic cell cycle), which is a pre-requisite for the interaction to take place. These conditions are defined by GO Molecular Function, GO Biological Process or NCI thesaurus terms.
A biological process is a multi-step event, captured in a single node and often mediated by a biomolecule. Biological processes are annotated with a GO Biological Process term or an NCI thesaurus term.
Each biomolecule plays a role in the interaction process: It can be an input, output, positive regulator or negative regulator. Inputs are transformed into outputs. The defining characteristics of positive and negative regulators is that they are not themselves transformed by the event, but are acting either directly or indirectly on the input. An output biomolecule can form the input, positive regulator or negative regulator in subsequent interactions.
Interactions are annotated with information about the source of the evidence and an evidence-type tag, called an evidence code. The source of evidence is annotated using PubMed identifiers. Users can limit their queries on NCI-Nature curated data, when performing an Advanced search, by evidence-type to selectively view and analyze information that corresponds to their interest and confidence in the evidence categories. The list of evidence codes used in PID is partially derived from the GO evidence codes:
|IAE||Inferred from Array Experiments||
|IC||Inferred by Curator||
|IDA||Inferred from Direct Assay||
|IFC||Inferred from Functional Complementation||
|IGI||Inferred from Genetic Interaction||
|IMP||Inferred from Mutant Phenotype||
|IOS||Inferred from Other Species||
|IPI||Inferred from Physical Interaction||Any physical interaction detection method, common ones are:
|RCA||Inferred from Reviewed Computational Analysis||
|RGE||Inferred from Reporter Gene Expression||
|TAS||Traceable Author Statement||
A pathway in PID is defined as a biologically meaningful set of interactions; a minimal pathway consists of a single interaction. For a complete list of pathways visit the Browse pathways page.