Supplementary Info
From the following article: An introduction to the Pfam protein families database
Data Model
Each Pfam-A entry starts from a set of representative homologous sequences. These sequences are aligned to make the seed alignment, which is in turn used to build a profile HMM using the HMMER3 software package (a version that is considerably faster and more sensitive than previous versions). This profile HMM is searched against three separate sequence databases, UniProtKB, NCBI GenPept database and a set of metagenomic sequences. To decide which matching sequences should be included and which rejected from the final full alignments, curators determine, manually, two cutoff thresholds (GA): one for the sequence and one for the domain. The sequence bit score is the log-odds score for the complete sequence. Where there are additional domains on the same sequence, the domain bit score is the log-odds score for just the domain; in other words, it is the score for the whole sequence if only this single domain envelope were found. In such cases the sequence score is more or less the sum of all the individual domain scores for that sequence. To include multiple domains, the two thresholds can be set independently, the domain score cutoff usually being lower than the sequence score cutoff. Setting two separate thresholds is used particularly to identify short repeat regions such as WD40 and TPR repeats.
Pfam-B families are generated at the time of a release to supplement the coverage of the Pfam-A families. Due to the substantial speed increase provided by HMMER3 relative to previous versions, it is now feasible to generate profile HMMs for the largest 20,000 of the Pfam-B families. As with all the other Pfam-Bs, however, these families have had no manual curation.
There are four types of Pfam-A entries: family, domain, repeat and motif. A family is a collection of conserved regions, whereas a domain is a structural unit that can be found in multiple protein contexts. A repeat is a short unit that is unstable in isolation but forms a stable structure when multiple copies are present; for example, a single blade of a WD40 β-propeller structure is one repeat. A motif is a short, conserved sequence-unit often indicative of specialized function, such as a ligand-binding site or a trafficking signal.
The entire Pfam database is available for downloading as a MySQL dump via the ftp site, and instructions for installing the website are available from the Help page. We also produce flat files that contain all the Pfam-A and Pfam-B alignments and profile HMMs. The Pfam-C flat file contains information about Pfam clans (see Table 1).
The data from the models and all associated information are presented to the user on the website in such a way that each family, Pfam-A or -B, clan, sequence, structure and proteome has its own web page. Each page provides access via a number of different tabs to all data relating to that entry, and every entry is hyperlinked to its relevant page. The layouts of each of these sets of pages are described in turn below.
Organization of the Website
Every web page carries six links at the top: Home, Search, Browse, FTP, Help and About.
The Pfam home page offers a number of entry points to the website.
The "Jump to" box on the home page, and on most other web pages, accepts the following accessions or identifiers and performs an exact match look-up on them (wild cards are not accepted):
- Pfam accession or identifier (for example, PF00001 or 7tm_1; PB000001 or Pfam-B_1)
- Pfam clan accession or identifier (for example, CL0053 or 4H_Cytokine)
- PDB identifier (for example, 1w9h)
- UniProtKB accession or identifier (for example, P15498 or VAV_HUMAN)
- NCBI GI number or secondary accession (for example, 349163 or AAA72015.1)
- Metagenomics identifier or accession (for example, ECP38001.1 or 141063662)
The "Search" link offers users a number of search modes, including by sequence-protein or DNA-by keyword and via functional similarity, domain organization, and by taxonomy. Further details of how these can be used can be found below.
The "Browse" link allows users to display a list of all Pfam families, Pfam clans and proteomes by name.
The "FTP" link leads to the Pfam flat files and database files. The "current_release" folder contains data pertaining to the most current release of Pfam, and older releases can be found under the "releases" folder. There are several different flat files available (see Table 1). The Pfam MySQL database can be installed locally by using the files in the "database_files" directory within the "current_release" folder. The pfam_scan script (pfam_scan.pl) can be downloaded from the "Tools" directory.
The "Help" link takes the user to a page for accessing documentation about the Pfam database. The help pages include, among other things, a summary of the details of each release, such as release date, UniProtKB version, a glossary of terms used within Pfam and a list of frequently asked questions. There is also documentation describing the tables in the Pfam MySQL database with some example queries.
The "About" link gives a brief summary of the mirror sites and their URLs, the HMMER package and the license agreement.
Context-specific icons
There is a separate page for every family, clan, structure, sequence and proteome, and each of these pages has a context-specific icon panel at the top right-hand corner. This panel gives a summary of the information contained within that page (see legend to Figure 1 for explanation). The panel will always contain the same icons, although some may be grayed out if the icon is not relevant to that particular page, and each icon is associated with a number. The five icons in the panel are: architectures, sequences, interactions, species and structures. For example, on a sequence page the number associated with the "sequences" icon will be 1 (one sequence); on a family page it will be the number of sequence regions in the full alignment for that family, and on a clan page it will relate to the number of sequence regions for all the families belonging to that clan (Figure 4). Each icon leads to the appropriate tab within the page.
Figure 1
A typical Pfam family page. At the top of every Pfam page are six links (Home, Search, Browse, FTP, Help and About), each of which is described in the main text. The "Jump to" box and the "Keyword" search are available on most Pfam pages. The context-specific icons at the top of the page indicate the number of different architectures, the number of sequence segments in the full alignment, the number of documented interactions, the number of different species represented in the full alignment and the number of PDB chain IDs that can be displayed for the family concerned. Each tab gives access to different data, details of which can be found in the main text. As illustrated here for the Piwi family (Pfam accession PF02171), this "Summary" tab displays the Wikipedia article. The "Clans" tab is not grayed out, indicating that Piwi is part of a clan.
Figure 4
A typical Clan page. The "Summary" tab on the Clan page for the Pfam clan Alk_phosphatase (Pfam clan accession CL0088) shows the summary function, the literature references, the clan members, external database links and one representative PDB structure from a drop-down list of all structures in the clan. The various general and clan-specific tabs are shown on the left-hand side.
Family pages
The majority of the information in the website is organized around the family pages where all data for a Pfam family are displayed (Figure 1). Entry via the Pfam identifier or accession number, or clicking on a domain name or graphic anywhere on the website, takes the user to the "Summary" tab for that family. The user is able to navigate through several tabs on the left-hand side of the page, each providing data about some particular aspect of the family: domain organization or architectures, clan, alignments, HMM logo, trees, curation and model, species distribution, interactions and structures. The information in each of these tabs is described below.
Summary: The "Summary" tab is the front-page view, or tab, for the family. Where a Wikipedia article has been written for this family then we display that Wikipedia entry by default. Three tabs are provided at the top of the section (Figure 1), to call up any one of the three possible sets of annotations: Wikipedia, Pfam, or InterPro.
In the absence of a Wikipedia article, the "Pfam" tab is displayed and will start with a brief summary of the function, if known. Sectional headings underneath may include an abstract, "Literature references," clan details and "External database" links. Where a three-dimensional structure for a member of the family exists, a static image of the structure is shown on this page; when multiple structures exist, the user can choose which structure image to show via a drop-down menu (Figure 1).
Users can click on the "InterPro annotation" tab to find the "InterPro entry," and "Gene Ontology" data if available.
Domain organization: This tab shows all the other domains that co-occur on a protein with the domain of interest. In Figure 10 the domain organization of the PepSY family (Pfam accession PF03413) is shown. The PepSY domain is rendered in green and other associated domains in other colors. Hovering the mouse over (mousing over) the domains pops up a tool tip with summary data. Beneath each graphic is a "Show" button that will toggle between the single example and the full set of all instances of that particular organization. This can be useful in determining the relative distances between the domains on the different sequences, how the different sequences might vary in overall length and whether or not any Pfam-Bs co-occur.
Figure 10
Domain organization for the PepSY family (Pfam accession PF03413). The domain organization listing of all the unique domain organizations or architectures in which the PepSY domain is found is shown. The first graphic is that of the most abundant combination and is drawn on the longest sequence carrying that domain. Each row contains the following information: the number of sequences that exhibit this architecture; a textual description of the architecture (e.g., PepSY, Peptidase_M4); a link to the sequence page for the sequence in this graphic; the UniProtKB description of the protein sequence; the number of residues in the sequence; and the Pfam graphic itself. The "show/hide" button toggles between viewing the single example and viewing all (629 in the first case) examples of a particular combination.
Clan: The "Clan" tab gives the name and accession number of the clan to which the family belongs, along with a list of all families in the clan. Clicking on the name or accession number of the clan brings up the full Clan page; see below.
Alignments: The Alignments" tab offers different options for viewing the seed: full (UniProtKB), NCBI and metagenomic alignments. The alignments can be viewed in the following formats: Jalview (available for all alignments), HTML (available for all alignments), Pfam viewer (available for seed and full alignments) or heat map view (available for Pfam-A full alignment only).
Where structures have been solved for members of a family, the HTML view of both the seed and the full alignments will show the consensus secondary structure (SS) (this is the consensus of all the structures in the family) marked below each residue in the relevant sequences. Figure 11 shows the SS lines in the seed HTML alignment for the Pfam domain 3-HAO (Pfam accession PF06052). Keys are also given in the figure for interpreting both the secondary-structure symbols and the ClustalX colorings.6
Figure 11
HTML alignment of a family seed with secondary-structure mark-up. (a) The HTML view of the alignment for the seed of the 3-HAO domain (Pfam accession PF06052) showing the secondary structure (SS) taken from DSSP,19 marked up from the consensus of the PDB structures 1zvf from 3HAO_YEAST (UniProtKB: P47096) and 2qnk from 3HAO_HUMAN (UniProtKB: P46952); (b) The colorings employed in the ClustalX alignments that are used throughout the website in the HTML views, and (c) an explanation of the code for the secondary-structure letters.
In the heat map view (Figure 12), the color of each residue reflects the alignment uncertainty, as determined by the posterior probability calculated by HMMER3. The figure legend explains the different colorings. This view gives a quick and easy visual clue to sequences, or parts of sequences, that align poorly to the HMM.
Figure 12
Full family alignment in heat map format. The heat map view of the full alignment of the C6 domain (Pfam accession PF01681) shows residues colored according to the posterior probability (essentially the expected accuracy) of each aligned residue, with the most accurate residues colored bright green and the least accurate bright red, where a * means the 95–100% band, 9 means 85–95%, and so on down to 1 meaning 5–15% and 0 meaning 0–5% posterior probability. These bands equate to a color scale from bright green for *, through paler green and pale red down to bright red for 0.
The seed and full alignments are all available for downloading in a number of different formats, suitable for different alignment-application tools: Selex, FASTA, Pfam or MSF formats. For occasions when users wish to analyze the full-length sequences of members of a domain, they can download the full-length sequences under the "Downloading options" table. This table also allows users to download all alignments in a gzip compressed form.
The "Alignments" tab also gives access to the external site of MyHits,7 which provides a collection of tools for handling multiple sequence alignments. For example, the user can refine a seed alignment (by sequence addition or removal, re-alignment or manual editing) and then search databases for remote homologues using HMMER3.
Trees: We generate neighbor-joining phylogenetic trees (with bootstrap values based on 100 replicates shown on tree nodes) for all our alignments, using FastTree8 (http://www.microbesonline.org/fasttree/). The user can view the seed, full, NCBI and metagenomic trees by selecting the required alignment at the top of the page and using the "change tree" button. The trees can also be downloaded in Newick format using the links at the bottom of this page.
Curation and Model: This tab contains information about how the HMM was built, including the curated thresholds for the family. The HMM can be downloaded from this page. The source of the seed alignment is also indicated in this section. In the case of the BLIP family (see Figure 1), the seed source is given as Pfam-B_41444 (release 10.0). Many seeds are derived from Pfam-B alignments, particularly when a function has been determined or a structure has been solved for one of the members. In some instances the seed source can be from a literature reference, when it is indicated by a "[1]" (that is, having come from the first reference shown on the "Summary" tab).
Species: This tab shows, by default a "Sunburst" rendition of the species distribution of all the sequences in the full alignment of the family (Figure 13). "Sunburst" uses a radial layout to indicate the species. The sunburst is color-coded according to the taxonomic distribution, thereby allowing rapid assessment of the taxonomic diversity found within the family. It is not interactive. This view has been implemented, because many species trees are very large and display is slow.
Figure 13
A Species-distribution taxonomic tree in "Sunburst" format. The species represented in the full alignment of the family Bestrophin (Pfam accession PF01062) are displayed in "Sunburst" format, by default, which facilitates visualization of the distribution across the different taxa. The controls panel indicates the species most recently highlighted by mousing over the sunburst, in this case Escherichia coli, when a pop-up window indicates species name, number of strains and the number of sequences from that species. The display can be altered by weighting either for number of species or for number of sequences. The key shows the colors used for each taxonomic kingdom.
The interactive taxonomic tree view is also available in the adjacent tab (Figure 14). The floating tool panel on the right allows the user to toggle between different views of the distribution: The tree can be expanded or collapsed at each of the nodes; those species found in the seed can be highlighted; individual nodes of the tree can be selected and the selected members viewed either as Pfam graphics or as an alignment; selected sequences can be downloaded. Full details on the interactive nature of the tree with explanation of the colors used are given in the legend to Figure 14.
Figure 14
Interactive taxonomic tree of the species distribution of all the sequences in a family. The "Tree" button leads to the interactive taxonomic species tree showing the frequency of occurrence of the Bestrophin domain (Pfam accession PF01062) across different species. The tree is generated by counting the number of domain matches on all sequences in the full alignment at each taxonomic level (number shown in the purple box), along with the number of unique sequences on which each domain is found (number shown in green), then grouping sequences from the same organism according to the NCBI code assigned by UniProtKB and counting the number of distinct sequences on which the domain is found (number shown in pink). The NCBI species tree forms the framework for displaying organisms within the tree. For all but the largest families, the tree is interactive. Due to performance issues with Internet Explorer, users of that browser are shown the text form of the tree by default, with the option to load the interactive tree if required. The tree controls can be used to manipulate how the interactive tree is displayed: to show/hide the summary boxes; to highlight species that are represented in the seed alignment; to expand/collapse the tree to a given depth; to select a subtree or a set of species within the tree, view them graphically or as an alignment, and download the sequence; and to save a plain-text representation of the tree. Users can select species of interest, as shown in the figure, and employ the tool controls accordingly.
HMM logo: The HMM logos offer a means of visualizing profile HMMs using the LogoMat-M software9; logos provide an overview of HMMs in graphical form and help to point up key conserved sequence motifs. As an example we have given the logo for the very short zinc-finger family, zf-C2H2 (Pfam accession PF01530) (Figure 15), where the characteristically conserved cysteines and histidines are clearly shown.
Figure 15
A family HMM logo. The HMM logo for the zinc-finger family zf-C2H2 (Pfam accession PF01530) shows the characteristic, highly conserved, C2H2 motif displaying prominently. A full explanation of the logo is given in the text.
To read a HMM logo, consider the stack of letters of different heights; the height of each stack represents the relative entropy of the distribution of the emission probabilities of the amino acid residues. In essence, the larger the letter the more likely that that amino acid will be present in that respective position in the protein family, and letters are sorted in descending order depending on their relative probabilities. The width of the column also varies to reflect the relative contribution of that position to the overall score; the pink, empty, columns represent the insert states. The colors of the amino acids reflect their respective biological properties, from charged, through polar uncharged, and aliphatic to aromatic, each of these sets being colored through red-brown, blue-purple, orange-yellow to green, respectively.
Interactions: The "Interactions" tab shows the other Pfam domains with which the family of interest interacts. Interaction data are taken from iPfam,10 a resource that describes physical interactions between Pfam domains that have a representative structure in the Protein Data Bank (PDB). Clicking on the "Interactions" tab for PDZ (Pfam accession PF00595) shows that it interacts with Peptidase_S41 (Pfam accession PF03572), Ras (Pfam accession PF00071) and Trypsin (Pfam accession PF00089). This feature can be useful for investigating biochemical pathways or complexes, and allows consideration of the relationship of the particular domain with other domains.
Structures: Where structures are available for a UniProtKB sequence in the PDB,11 these will be displayed under the "Structure" tabs on the sequence, family or clan pages in the website. They will also each have their own structure page.
For those sequences that have a structure in the PDB, we use the mapping provided by PDBe (http://www.ebi.ac.uk/pdbe/docs/References.html) between UniProtKB and the PDB coordinate systems to map Pfam domains onto three-dimensional protein structures.
The "Structure" tab will show a table of the exact mappings for all structures that map to the Pfam domain of interest. Users can choose one of the following molecular viewer applets: Jmol (http://www.jmol.org), AstexViewer12 or SPICE,13 to view the domain(s). The Jmol viewer and AstexViewer show a representation of the whole PDB structure with the domain of interest marked in green and any other Pfam domains shown in other colors. Any regions that do not map to a Pfam domain are shown in gray, and the images are rotatable and with zoom facilities. In AstexViewer, the protein backbone of the whole PDB structure is drawn as secondary-structure elements, with the domains themselves surrounded by semitransparent molecular surfaces that one can toggle on (Figure 16a) or off (Figure 16b). SPICE is a more complex and powerful viewer that allows the user to browse sequences, structures and their annotations on both the sequence and the structure of a particular protein.
Figure 16
The AstexView of domains on a protein structure. This is the AstexViewer images option for viewing the structure reached via the "Structure" tab on the Structure page for PDB:2g2u. The protein backbone is drawn as secondary-structure elements (cartoon format) and colored according to the different domains on it, with the domains themselves surrounded by semitransparent molecular surfaces representing the van der Waals surfaces for the atoms that fall within the domain (a), and the regions not assigned to a Pfam domain colored in gray; the molecular surfaces can be toggled on and off using the "show/hide" button (b).
Sequence pages
Each sequence in the primary databases has its own sequence page, which can be accessed in many different ways as indicated above (see Figure 2 for a typical page). The page shows the summary information and a brief description of the sequence from its parent database of UniProtKB, NCBI or, in the case of metagenomic sequences, the source of the data. The precalculated organization of domains and features on the sequence are given with a table indicating the residue ranges for each feature. There is mark-up for active-site residues and metal-binding residues, and the positions of any disulfide bridges are indicated, the residues positions for all of which appear in a pop-up window when mousing over each one. Positions for signal peptide, coiled coil, transmembrane and/or low-complexity regions are also shown, if present. Where there is a corresponding structure in the PDB, this will be displayed on the "Structures" tab. Where a TreeFam14 phylogenetic tree is available, this is displayed under the "TreeFam" tab.
Figure 2
A typical Sequence page. This is the sequence page reached by entering Pfam with the accession or identifier for AURE_STAAU (UniProtKB: P81177), the second sequence displayed by default for the domain organization for the sequences in the full alignment of the PepSY domain (Pfam accession PF03413) of Figure 10. The page shows all the summary information from UniProtKB with description, source organism and length, and then the Pfam domain graphic indicating visually the relative positions of the different Pfam domains and regions on the sequence, and in tabular form the start and end residues. Explanations of the individual domain graphics are given in the legend to Figure 3.
Figure 3
An illustrative domain graphic of all sequence features. This figure shows all the possible elements that might be present on a sequence; Pfam-A families and domains are represented by lozenge shapes, whereas Pfam-A repeats and motifs are shown as rectangles. See the Data Model section of Supplementary Information for the definition of each type. The match within the alignment coordinates is blocked in solid color, and that between the envelope coordinates appears in a lighter shade of the same color; where there is a partial match to a HMM, the edge of the shape is jagged. Pfam-B families are shown as a rectangle with three stripes. Disulfide bridges are marked with a gray line and nested domains with a black line. Active-site residues are depicted with a diamond-head lollipop, and metal-binding residues are depicted with a circle or square-headed lollipop. A circular head denotes that the metal-binding residue has been determined experimentally, whereas a square head means the metal-binding residue is predicted. The color of the lollipop head on metal-binding residues denotes which ion the residue binds, and the identity of the ion can be found by "mousing over" the lollipop of interest. Positions for signal peptide, coiled coil, transmembrane and low-complexity regions are shown as semitransparent rectangles in the colors orange, green, red and blue, respectively. Tool tips are available for all elements of the graphic.
Clan pages
The clan pages can be accessed in a number of ways. The user can enter a clan accession or ID into the "Jump to" or "View a clan" box on the Pfam home page; use the "Browse" menu at the top of each Pfam page to view a list of clans; or click on the "Clan" tab on a family page. The tabs on the clan page are very similar to some of those on the family page described above, and include "Summary," "Domain organization," "Alignments," "Species," "Interactions" and "Structures" (Figure 4). The clan pages also have a "Relationships" tab, which gives a graphical image of the interconnections between the clan members, determined by HHsearch15 alignment between the family HMMs (Figure 17). Clicking on any E-value in this graph will take the user to the pairwise HMM logo for that particular relationship, a feature that is particularly useful for comparing the sequence similarity between the two families.
Figure 17
PRC (Profile Comparer) alignment relationships between clan members. This shows the relationships between all the members of the Pfam clan Alk_phosphatase (Pfam clan accession CL0088). These relationships are determined by a PRC alignment between the family HMMs. Families are deemed to be closely related if their E-value is less than 10-3 (shown with a solid line) and less closely when the E-value is between 10-3 and 10-1 (shown with a dashed line).
Structure pages
All the data collated on a PDB structure are displayed here together with relevant structure-specific tabs (Figures 5 and 18). The front page (Figure 5) shows the source reference, links to structure-specific external databases, and additional information from the TOPSAN wiki if available (Figure 5). Where a structure has been cited in publications that are available via PubMed Central (http://www.ncbi.nlm.nih.gov/pmc/), these articles can be viewed in marked-up format via BioLit16 (Figure 18).
Figure 5
A Structure page with TOPSAN entry. The "Summary" tab of the Structure page for PDB:3due, the structure common to the PepSY clan, gives summary details of the experimental procedures, then a list of links to external structure databases, and finally the information about the structure from TOPSAN. Structure-specific tabs on the left link to the PubMed Central literature reference if available (see Figure 18), to the domain organization (i.e., a graphic of all domains on the sequence), the sequence mappings of these domains, and the different viewing possibilities for the structure.
Figure 18
A Structure page with a BioLit entry. The "Literature" tab of the Structure page for PDB:1jtg shows that the BioLit project has tagged one reference from PubMed Central. The full title, author list, journal reference, abstract and external links of this article are shown along with the figures found in the article.
Additional Search Tools
DNA searching: There is the facility for searching a single sequence (in FASTA format) of up to 80 kilobases of DNA against the library of Pfam HMMs using the program GeneWise, part of the Wise2 package (http://www.ebi.ac.uk/Tools/Wise2/index.html). This tool is accessed from the Search link at the top of the home page. The results are emailed to the user once they have been computed.
Proteome analysis: Pfam precalculates the domain compositions and architectures for all the proteomes present in Integr8 (http://www.ebi.ac.uk/integr8/) at the time of making a Pfam release. An alpha-numerical list of all proteomes can be accessed from the "Browse" link on the top of the home page, and clicking any letter brings up a table of all proteomes with their complement of domains and their coverage. By clicking on a particular organism, the user will be able to view the proteome page for that organism with a short summary. There are tabs to access the full domain organization (that is, examples of all unique domain combinations) and the domain composition (that is, a table of all domains found with the numbers of sequences and numbers of occurrences of each). A table listing all the structures solved for proteins in that proteome can also be generated, though this may take time for larger proteomes.
Taxonomy searching: From the "Search" link at the top of the home page a "Taxonomy" tab allows querying with Boolean logic for domains present or absent in particular species or phyla—along the lines of "What domains are present in Fungi and in Mammalia?"—or to find families or domains that are unique to a particular species (note: this can be very slow).
Domain architecture searching: From the "Search" link at the top of the Pfam home page the "Domain architecture" tab allows the user to find proteins with a specified set of domain combinations (architectures). The query tool gives the options to specify domains that must appear in an architecture, and domains that must not appear in that architecture. For a more detailed analysis of domain architectures the user could go to the application PfamAlyzer,17 a tool that allows the user both to request a particular domain combination (and restricted by amino acid distances, if desired) and to restrict the species returned to Archaea, Bacteria and/or Eukaryota (http://pfamalyzer.sbc.su.se/pfamalyzer/).
Keyword text searching: The Keyword search from the "Search" link allows users to search Pfam using a keyword (for example, apoptosis). Searches are case-insensitive and will not tolerate special characters or wild cards; they also do not use Boolean logic. The search currently returns results from the following sections of the database:
- Text fields within Pfam entries, such as description and comments;
- Sequence description and species fields;
- HEADER and TITLE records from PDB files;
- Gene ontology (GO) IDs and terms;
- InterPro abstracts
Communication
There are several ways in which users can contact the Pfam team to give feedback on families, submit potential new Pfam-A families or request further assistance in using the website.
Contacting Pfam: At the bottom of every page there is an email address for the Pfam help desk (pfam-help@sanger.ac.uk), and users are encouraged to get in touch with the Pfam team regarding any problems, comments or questions they might have. Users can also submit new families using this address.
Add annotation or edit Wikipedia article button: On every family or clan page with no linked Wikipedia article there is a blue "Add annotation" button next to the family or clan descriptor. Clicking this button leads to a form that attempts to collect the user's comments in a semistructured format, and sends those comments directly to Pfam.
For pages where the Summary is the Wikipedia entry for this family or clan, the wording inside the blue button is "Edit Wikipedia article." Clicking this button now leads directly to the editable screen-page for that Wikipedia article. Although the Wikipedia article will be immediately updated, the Pfam page will be screened overnight and so will not be updated until the following day.
Both emails and annotation comments are handled by a request-tracker (RT) system at the WTSI that ensures queries do not get lost. The sender is supplied with an RT ticket number that links all further correspondence on this query.
About: The "About" link at the top of every web page gives brief details on: the version of the current release, the sequence databases on which we base our searches for that release, and the software underlying the database. Links to the other mirror sites are also given.
Pfam Blog: We write a Pfam blog, and the most recent posts can be seen at the bottom of the Pfam home page. We use this to communicate current work and future plans, for discussing Pfam-related topics and for announcing new releases. Users are encouraged to subscribe to the RSS feed for the blog, which is low traffic, and to leave comments as appropriate.




