Skip Navigation

Pathway Interaction Database homepage

Bioinformatics Primer

The RCSB Protein Data Bank: site functionality and bioinformatics use cases


The Protein Data Bank (PDB) archive currently contains the atomic coordinates, sequences, annotations, and experimental details of more than 70,000 proteins, nucleic acids and complex assemblies. Navigating and analyzing this large and rapidly expanding volume of structural information by sequence, structure, function and other criteria becomes increasingly difficult. The RCSB PDB website ( provides powerful query, analysis and visualization tools to aid the user in mining the PDB. In addition, the RCSB PDB integrates structural data with information about taxonomy, biological function, protein domain structure, literature, Molecule of the Month articles and other resources, to present the data in a biological context. The three-dimensional (3D) structures make possible an atomic-level understanding of biological phenomena and diseases, and allow the design of new therapeutics.

The PDB is the single worldwide repository for data on the 3D biological macromolecules. The archive is managed by the partners of the Worldwide Protein Data Bank1 (wwPDB, RCSB PDB,2 PDBe3 (Europe;, PDBj (Japan; and the Biological Magnetic Resonance Databank (BMRB, United States; wwPDB members host data deposition and annotation sites, distribute data, and collaborate on issues of policy, formats, standards and curation. Each member also develops different tools and resources to study and utilize the data. The RCSB PDB website provides resources and tools to mine and analyze sequences, structures, ligands and annotations. This primer describes data representation, educational resources, and search workflows; several examples demonstrate searching by sequence, structure and function.

Data representation

The original representation of macromolecular structure data is the PDB file format that was created in the 1970s4 and is still widely used today ( To address inherent limitations of the PDB format, the macromolecular crystallographic information file format (mmCIF, was created.6 A direct translation of this format into XML is available as the PDBML/XML file format ( In addition, residues and small-molecule components found within PDB entries are described by the Chemical Component Dictionary (

The PDB Exchange Dictionary uses the mmCIF file format and consolidates content from a variety of crystallographic dictionaries. The basic structure of these dictionaries is to organize the data describing a macromolecular structure into categories of related data items, such as author or citation information, atom coordinates, refinement details (for x-ray structures) and so forth. The PDB Exchange Dictionary forms the basis for data processing and database management of the RCSB PDB website. In fact, each mmCIF category forms a relational table and thus defines the schema for the database. At the time of writing, the production version of the PDB Exchange Dictionary (version 1.0680) contains 3,963 items in 337 categories.

In addition to the primary data contained within the PDB entries, external annotations are loaded into derived database tables.8, 9 These data are obtained from external resources such as the National Center for Biotechnology Information (NCBI) Taxonomy, Gene Ontology (GO), Enzyme Classification (EC), Structural Classification of Proteins (SCOP) and many others, and from the Structure Integration with Function, Taxonomy and Sequence (SIFTS) project, which is a joint initiative of Universal Protein Resource (UniProt) and the PDBe.

Educational resources and tutorials

The RCSB PDB website provides a variety of resources that explain important structures as well as technical details about structure determination.10 Since 2000, the Molecule of the Month features have introduced the structure and function of a molecule, visualized structural features and discussed its relevance to health and disease. To learn about structure determination methods and the measures of structure quality, consult the Understanding PDB Data tutorial. A Glossary of Technical Terms explains frequently used terms, and the Help System provides detailed descriptions and video tutorials.

Overview of site functionality

The RCSB PDB website offers several options to search and browse the content, including Simple Search, Advanced Search, Tree Browsers, and Query by Example. These search methods form part of the functionality included in the workflow diagram (Fig. 1).

Figure 1
Figure 1 :

Schematic diagram of major components of the RCSB PDB website and possible workflows. Each element of this diagram is explained further in a section of this primer.

Simple search. The search box at the top of the website (Fig. 2) provides by default a PDB ID lookup or a text search. PDB entries are uniquely identified by four-character IDs (e.g., 2CPK). A PDB ID lookup from the top search box takes the user directly to the Structure Summary page, which is described further below. Text searches query for the text string anywhere in the mmCIF structure file. As a consequence, the search results are very inclusive and may include structures that are less relevant. For example, searching for “kinase” currently returns more than 4,200 structures. As described further below, searches can be refined via faceted browsing or advanced search. In addition, a few frequently used search options are available from the top search box by selecting them from the pull-down menu (Fig. 2), such as the “Author” search (which provides an auto-complete feature), and searches for ligands (which are bound to macromolecules) by their chemical name or their one to three character ID as defined in the Chemical Component Dictionary (e.g., HEM).

Figure 2
Figure 2 :

Screenshot of the simple search in the top navigation bar of the RCSB PDB website. By default, a search will be interpreted as a PDB ID lookup if it matches a valid PDB ID (e.g., 4HHB) or otherwise a full-text search of each complete structure file. Alternatively, users can choose other search options from the left-hand menu: “Author” (search by author name, either structure author or citation author; auto-completion available); “Structural Genomics Center” (pull-down menu of Structural Genomics Center name); “Chemical Name” (search by name of a small molecule contained in the entry; auto-completion available); “Chemical ID” (search by the Chemical Component Dictionary ID of a small molecule contained in the entry, e.g., HEM); “PubMed Abstract” (text search of the PubMed abstract of primary citations associated with PDB entries).

Results browser. Any search that returns more than one PDB entry (e.g., the above text search for “kinase”) takes the user to the Results Browser page (Fig. 3). Related results, such as “Unreleased Entries,” “Citations,” “Ligand Hits” and “Web Page Hits,” are available from tabs at the top of the page. Users can navigate through multiple pages via the controls at the bottom of the page, or perform other actions as described subsequently. However, it is likely that the user may first want to refine the search results, rather than sort through for example, more than 4,200 “kinase” search results.

Figure 3
Figure 3 :

Results Browser. This page summarizes search results whenever a query returns more than one PDB entry. Tabs at the top link to associated result sets, such as unreleased entries, citations, ligands and web pages. The next box underneath the tabs allows for queries to be refined through faceted browsing (see Fig. 4) or advanced search (see Fig. 7). The next box provides menus for file downloads, tabular reports (see Fig. 8) and sorting of results. A short summary for each entry follows. PDB IDs (e.g., 2X51), structure titles and thumbnail images each link to the Structure Summary page (see Fig. 5) for the respective entry.

Faceted browsing. The most immediate way to refine queries is through faceted browsing, which is available from the Query Refinements box near the top of the page. The contents of this box may initially be hidden, but clicking on “Show” in the top right corner (or anywhere inside the Query Refinements box) causes all query refinement options to be displayed as pie charts shown in Figure 4. The query results are broken down by several criteria (called facets), including organism, resolution and SCOP domain annotation. This “drill down” technique is frequently used on e-commerce sites to browse catalogues of items, and can be similarly applied to browsing sets of structures. Clicking on any of the breakdowns (e.g., “Homo sapiens (man)”) refines the query according to that criterion. This process can be repeated—for example, to only include high-resolution x-ray structures (“less than 1.5 Å”) that were released recently (“2010 today”).

Figure 4
Figure 4 :

Faceted browsing. Screenshot of the expanded Query Refinements section of the Results Browser page. Each link (e.g., Homo sapiens) constitutes a single-click refinement of the search results. Additional help is available by clicking on the question mark next to “Query Refinements”.

Structure summary. The Structure Summary page (Fig. 5) provides access to detailed data for each entry in the PDB. This page is reached from the top search box via the entry's PDB ID (e.g., 2CPK) or by clicking on the ID, title or image of any entry on the Results Browser page (see Fig. 3). Individual components (widgets) can be rearranged or hidden to customize the appearance of the page (this can be undone with “Restore Layout” on the bottom of the page). Tabs on the top of the page link to further data for the structure such as a sequence page, external annotations, sequence and structural neighbors, and literature.

Figure 5
Figure 5 :

Structure Summary page for an individual PDB entry. Components (widgets) of the page can be rearranged or hidden. The tabs on the top provide access to additional pages as follows: Sequence (sequence views and secondary-structure annotations); Annotations (external data from SCOP, CATH, GO, Pfam, Structural Biology Knowledgebase); Seq. Similarity (sequence clusters—i.e., entries with similar sequences); 3D Similarity (entries with similar 3D structures); Literature (data from the primary citation and related articles); Biol. & Chem. (further biological and chemical details); Methods (experimental details on the structure determination method(s)); Geometry (geometry data, e.g., bond lengths and angles, Ramachandran plot); Links (hyperlinks to external resources with data related to the specific PDB entry).

Visualization. Structures can be visualized using several tools. The Structure Summary page shows a two-dimensional still image for the structure. For x-ray entries this view can be toggled between one or more biological assemblies and the asymmetric unit (for a definition of these terms consult the Glossary of Technical Terms). These can be the same (e.g., 2CPK) or different (e.g., 1AEW). Links to several 3D viewers are available underneath the image (note that some are only available from the asymmetric unit view). The most commonly used visualization tool is Jmol (Fig. 6). A multitude of display options are available both from the Jmol applet itself (by right-clicking on the image) and through the Jmol Script text box, pull-down menus, action buttons and other page elements underneath the image.

Figure 6
Figure 6 :

3D visualization with Jmol (Jmol: an open-source Java viewer for chemical structures in 3D. The display can be modified by right-clicking on the image (Jmol applet) and through the Jmol Script text box, pull-down menus, action buttons and other page elements underneath the image.

File download. Links to display or download files related to a single entry in the PDB are available in the top right corner of the Structure Summary page (see Fig. 5). File downloads for multiple structures are available from the “Display/Download:” pull-down menu on the Results Browser page (see Fig. 3), or the “Download: Entries|Ligands” links in the “Tools” section of the left-hand menu.

Tree browsers. Another approach to searching for PDB entries are the tree browsers available from the “Browse Database” link in the Search section of the left-hand menu. These browsers group PDB entries according to one of several criteria with a hierarchical organization, such as GO terms, EC numbers and SCOP classifications. One example of using a tree browser is described further below under “Search by functional and other annotations.”

Advanced search. A first approach to refining a search was discussed under “Faceted Browsing.” A second approach is the query refinement via advanced search. After an initial search (e.g., the earlier “kinase” text search) the Advanced Search interface is reached by clicking on the “Refine Query” link at the bottom of the Query Refinement box on the Results Browser page (see Fig. 3). Note that the previous search has already filled in one row of the Advanced Search interface (Fig. 7). Additional search criteria (e.g., “Publication” ≫ “Citation” ≫ “Journal: Nature”) can now be added. An advanced search can also be started de novo (i.e., not as a refinement of a previous search) by clicking on “Advanced Search” on the top right of the website.

Figure 7
Figure 7 :

Advanced search. The top portion was automatically filled in after doing a simple search for “kinase” and clicking on “Refine Query” on the Results Browser page. The Citation search was added after clicking on “Add Search Criteria.” The “Results Count” button calculates the number of results for each individual part of the query.

Tabular reports. Once a search has been sufficiently refined, a user might want to create a report on the resulting set of structures. These reports are available from the “Generate Reports:” pull-down menu on the Results Browser page (Fig. 8a). For example, Figure 8b shows the Primary Citation summary report for the refined “kinase” query. Reports can also be customized to include columns chosen by the user (“Custom Reports” ≫ “Customizable Table”). Data within a tabular report can be sorted or further filtered. Tabular reports can be exported and saved in csv or Microsoft Excel format for further offline analysis.

Figure 8a
Figure 8a :

(a) Tabular report menu. The user can select from custom, and predefined summary and method-specific experimental details reports.

Figure 8b
Figure 8b :

(b) Tabular report for primary citation. Tabular reports can be sorted, filtered (using the text boxes underneath the column headers or the “Filter Results” link) and downloaded. Further help is available by clicking on the help icon “?” in the top bar.

Saving a query/MyPDB. MyPDB is a service that allows registered users to customize the RCSB PDB. One MyPDB feature is the ability to store queries, have them automatically run on a weekly or monthly basis, and be notified by email of new entries that match the query. New users can register for an account from the MyPDB section of the left-hand menu of the site. The “Save Query to MyPDB” link in the “Search” ≫ “Results” section of the left-hand menu will add the last query to the user's list of stored queries. Note that this link is also still available if the user has already navigated to the Structure Summary page for one entry out of the last result set. From the Saved Query Manager (Fig. 9), one can update the name or description of a stored query, toggle email notifications on or off, run a query on demand or delete it.

Figure 9
Figure 9 :

MyPDB saved query manager shows a saved query and its current status. Email notification is turned on for this query. Other MyPDB functionality can be accessed from the top of the page: manage personal annotations, display settings, and user account (e.g., frequency of email notifications).

Query by example. Another way of quickly executing a search is the so-called query by example. Items on the Structure Summary page (see Fig. 5) that are identified by a simple magnifying-glass icon are hyperlinked such that clicking on the item will directly execute a search for all other structures in the PDB that share that data value. For example, clicking on an author name will retrieve all other structures by that author.

The following sections illustrate how to query by sequence, structure and function.

Search by sequence

Sequence similarity search. A protein sequence based search is the most reliable method to find specific proteins because it does not rely on text or annotations that may be incomplete or inconsistent. The PDB offers the standard sequence search methods: Basic Local Alignment Search Tool (BLAST), FASTA,11 and Position-Specific Iterative (PSI)-BLAST. The last method is able to find more distantly related proteins than the first two methods. If the sequence of a protein of interest is not readily available, the sequence of a representative structure in the PDB can be used for a sequence search by typing in the PDB ID and choosing one of the polypeptide chain sequences as a search option (Fig. 10). To search the PDB with a nucleotide sequence, use the BLASTX Translated Nucleotide search; this generates the six open reading frames to find matching protein sequences.

Figure 10
Figure 10 :

Sequence search. This example shows how the polypeptide sequence of a PDB entry (4HHB) can be used in a sequence search. Alternatively, the user can paste a sequence into the sequence text box. The help icon “?” links to a description of available search tools and their parameters.

Sequence motif search. Motifs are short sequence fragments or sequence patterns in protein and nucleotide sequences. Examples of sequence motifs include affinity tags,12 epitopes and functional motifs.13 A sequence pattern can be defined by regular expression syntax, a powerful notation for defining complex patterns. A frequently used tag combines a poly-histidine affinity tag containing five or more consecutive histidines (HHHHH) with a thrombin-specific cleavage site (LVPRGS) through a linker region of 0 20 of any residue type X. Using regular expression syntax, this motif can be written as HHHHHX{0,20}LVPRGS. Table 1 shows examples of frequently used motif search types.

The Sequence Motif page under Help Topics on the PDB website lists further examples of motif searches. Other examples of sequence motifs can be found in the PROSITE database13. PROSITE motifs are specified in a custom notation; for example, N-{P}-[ST]-{P} defines an N-glycosylation site ( These patterns must be converted to regular expression syntax before they can be used with the motif search.

Note that the sequences in the PDB represent the entire sequence of the macromolecule sample used for crystallization (SEQRES record in the PDB file). This entire sequence is used for sequence searches. Some regions of a structure can be invisible for example in the electron density map of an x-ray structure. These unobserved or disordered regions are not listed in the ATOM (atomic coordinates for standard residues) records of the PDB file.

Sequence similarity clusters. The RCSB PDB website offers pre-calculated protein sequence clusters from 30% to 100% sequence identity. A description of the clusters can be found at the website's Redundancy page. Sequence clusters are useful to find identical and homologous proteins, or natural and engineered mutants of proteins.

For a given PDB structure, the sequence clusters are accessible from the “Sequence Similarity” tab on the Structure Summary page (Fig. 11a; see Fig. 5). Each cluster can be expanded to view its members, their NCBI taxonomy ID, and the EC number if available (Fig. 11b). The members within a cluster are ranked by resolution and R-value. Structures with better resolution and lower R-values are preferred. The member with rank no. 1 is the cluster leader, a representative for this cluster. Because the protein sequences in the PDB are highly redundant, the sequence clusters can also be used to reduce search results to a set of representative sequences. Figure 12 shows how similar structures can be removed from a result set.

Figure 11a
Figure 11a :

(a) Sequence similarity tab (see Fig. 5). Sequence clusters from 30% to 100% sequence identity for PDB entry 4HHB. This entry has two distinct polypeptide chains (α and β), and clusters for each chain are listed. The user can click on a cluster to display the list of homologous protein chains in the PDB.

Figure 11b
Figure 11b :

(b) Sequence cluster. The cluster members (homologs) for the hemoglobin α-chain at 95% sequence identity threshold. At this high level of sequence identity all members have the same NCBI Taxonomy ID: 9606 (Homo sapiens). Clusters at a lower level of sequence identity contain sequences of other species (e.g., orthologs). For enzymes, the EC number is listed to provide information about function conservation.

Figure 12
Figure 12 :

Remove similar sequences. A text search for hemoglobin returns more than 500 hits. To remove redundant sequences, the user can select a sequence similarity threshold. A lower sequence identity threshold will result in fewer representative structures.

Search by structure

Search for structural neighbors. Sequence-based searches can be used to find homologous proteins in the PDB. Generally, the structures of proteins with at least 40% sequence similarity are assumed to have similar structures.14 However, more distantly related structures cannot be found by this approach. Structure is more highly conserved than sequence during evolution. To find similar structures with low or no detectable sequence identity, a geometric alignment of the CA (Cα, the carbon atom to which the amino acid side chain is attached) protein atoms can be used to find structural neighbors. We have structurally aligned all representative protein chains from the 40% sequence identity clusters described in the previous section using the jFATCAT program.15 For a given PDB structure, the “3D Similarity” tab on the Structure Summary page (Fig. 13a; see Fig. 5) lists structurally similar protein chains in a table. This table can be sorted, for example, by the P value (significance of the alignment; lower numbers are better), the structural similarity as root mean square deviation (RMSD), etc. The links in the Results column lead to the Structure Alignment View pages, which display a superposition of the two aligned chains (Fig. 13b) along with the sequence alignment.

Figure 13a
Figure 13a :

(a) 3D similarity. Structural neighbors of a green fluorescent protein (PDB ID 2WUR chain A). By default, a list of structural neighbors is sorted by the significance of the structural alignment (P value). Some of the top-scoring structural neighbors include yellow fluorescent protein and killer red. However, the list also contains unexpected similarities such as nidogen-1 (PDB ID 1GL4 chain A). Nidogen-1, for example, shares a conserved 11-stranded β-barrel and an internal helix with the fluorescent protein structures, but it does not contain a chromophore. The structural similarity can be expressed by the RMSD (root mean square deviation from the query structure in Å). For example, nidogen-1 has a RMSD of 3 Å. A description of the other columns of this table is available on the website.

Figure 13b
Figure 13b :

(b) Structural alignment of green fluorescent protein (PDB ID 2WUR chain A, orange) with nidogen-1 (PDB ID 1GL4 chain A, cyan). Nonaligned regions are colored in gray. The high structural similarity (3 Å RMSD) suggests that the two proteins have evolved from a common ancestor. A standard sequences search does not find this relationship because there is only an 8.8% sequence identity between the two proteins. On the other hand, 91% of green fluorescent protein residues align structurally with nidogen-1.

Pairwise structural alignments. The “Compare Structure” tool, accessed via the search box in the middle of Figure 13b (Fig. 14a), aligns pairs of protein chains using multiple algorithms. jFATCAT and jCE15 perform a rigid alignment. For proteins that undergo structural changes (i.e., domain movements) the jFATCAT flexible method can be used. It is also possible to align circular permutated proteins using the jCE circular permutation option. Figure 14b shows the structural alignment of glucanase and several circular permutations. A detailed description of this example is available from the “Concanavalin A and Circular PermutationMolecule of the Month article.

Figure 14a
Figure 14a :

(a) Comparison tool: The user types in the PDB IDs of two protein structures and selects two protein chains to be aligned. Then, either a pairwise sequence or structural alignment method can be selected.

Figure 14b
Figure 14b :

(b) Circular permutations. Glucanase structures with circular permutation aligned with jCE. The blue and red ends of the protein chain are the N-terminal and C-terminal regions, respectively. This example is described in the “Concanavalin A and Circular Permutation” Molecule of the Month article (see text; image created by David Goodsell).

Structural neighbor search against the PDB. For a new protein structure that has no significant sequence similarity to any known structure, a structural similarity search can be run. Clicking on the “Align custom files” link on the Protein Comparison Tool page (Fig. 15) will bring up the structural alignment tool, which can be installed on a user's computer using the Java Webstart protocol. Once the installation is complete, the user can load a PDB file with a new structure and then run a comparison against the representative PDB structures. This search is carried out locally on the user's computer, and results are also stored locally. After the run finishes (this may take several hours), a ranked list of hits is presented in a tabular format. The user can now browse through the hits and view the structural and sequence alignments.

Figure 15
Figure 15 :

Database search with the “Pairwise Structure Alignment” tool. To perform a structural similarity search the user selects the “Database Search” and “Custom files” options. The user then browses to the file that contains the query structure and selects an output directory where the results of the search are stored. After the search completes, a sorted list of hits is presented to the user.

Search by functional and other annotations

The PDB can be searched and browsed by a number of functional annotations, source organism and genome location, and domain annotations listed in Table 2. These annotations are based on controlled vocabularies or ontologies, which standardize common terms. Annotations are retrieved from third-party sites (i.e., SCOP, CATH, Transporter Classification) or mapped via the UniProtKB accession number (i.e., GO, EC, Entrez Gene). It is important to note that annotations are incomplete; for example, the SCOP database was last updated in June 2009, and so newer structures are missing SCOP annotations.

Figure 16a shows how to search the GO Molecular Function tree (available from the home page using the “Browse Database” link) for structures related to transcription factors. An auto-complete feature lists matches as the text is typed, and the user can select the most suitable match. Alternatively, the hierarchy can be navigated directly by expanding nodes in the hierarchy (Fig. 16b). Moving the mouse over a term displays the number of related PDB entries. Clicking on a term retrieves the associated PDB structures.

Figure 16a
Figure 16a :

(a) Molecular Function browser. This browser was selected by clicking on the associated tab at the top of the page in the GO Molecular Function tree. The user then types in a term in the text box. While the user is typing, the auto-complete feature suggests possible matches.

Figure 16b
Figure 16b :

(b) After searching for a term, a match in the GO Molecular Function hierarchy is expanded. Terms that have associated structures in the PDB are highlighted in blue. Mousing over blue terms displays the number of entries in the PDB, and clicking on a term retrieves the associated structures.


The RCSB PDB website offers powerful ways to search and browse the database by sequence, structure and functional annotation. Initial query results can be easily refined via faceted browsing or the advanced search. Results can be viewed and sorted, displayed and exported in tabular reports, or downloaded as PDB files for further analysis. Sequence clusters and structural neighbors are efficient ways to find homologous structures, or structural representatives in the PDB. Custom sequence and structural alignment tools allow readers to compare their own data against the PDB.

To keep current with the latest features of the RCSB website and to learn about features not mentioned in this primer, visit the What's New page and read a description about the redesigned RCSB PDB website and web services.16


This work was supported by the National Science Foundation (NSF DBI 0829586), National Institute of General Medical Sciences (NIGMS), Office of Science, Department of Energy (DOE), National Library of Medicine (NLM), National Cancer Institute (NCI), National Institute of Neurological Disorders and Stroke (NINDS), and the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). The RCSB PDB is managed by two members of the RCSB: Rutgers University and the University of California at San Diego.

Peter W. Rose1,*, Wolfgang F. Bluhm1,*, Bojan Beran1, Chunxiao Bi1, Dimitris Dimitropoulos1, David S. Goodsell2, Andreas Prlic'1, Gregory B. Quinn1, Benjamin Yukich1, Helen M. Berman3 & Philip E. Bourne1,4

1. San Diego Supercomputer Center, University of California San Diego, La Jolla, California, USA.
2. Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, USA.
3. Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA.
4. Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, USA.
* Address correspondence to:
* Address correspondence to:


  1. Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980 (2003). | Article | PubMed | ISI | ChemPort |
  2. Berman, H.M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000). | Article | PubMed | ISI | ChemPort |
  3. Velankar, S. et al. PDBe: Protein Data Bank in Europe. Nucleic Acids Res. 39, D402 D410 (2011). | Article | PubMed | ISI |
  4. Bernstein, F.C. et al. The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542 (1977). | Article | PubMed | ISI | ChemPort |
  5. Westbrook, J. & Fitzgerald, P.M. The PDB format, mmCIF, and other data formats. in Structural Bioinformatics (eds Bourne, P.E. & Weissig, H.) 161–179 (John Wiley & Sons, Inc., Hoboken, NJ, 2003).
  6. Fitzgerald, P.M.D. et al. Definition and exchange of crystallographic data. in International Tables for Crystallography (eds Hall, S.R. & McMahon, B.) Vol. G., 295–443 (Springer, Dordrecht, The Netherlands, 2005).
  7. Westbrook, J., Ito, N., Nakamura, H., Henrick, K. & Berman, H.M. PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21, 988–992 (2005). | Article | PubMed | ISI | ChemPort |
  8. Deshpande, N. et al. The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res. 33, D233–D237 (2005). | Article | PubMed | ISI | ChemPort |
  9. Bourne, P.E. The Evolution of the RCSB Protein Data Bank Website (WIREs Computational Molecular Science, Wiley-Blackwell, 2011, in press).
  10. Dutta, S., Zardecki, C., Goodsell, D.S. & Berman, H.M. Promoting a structural view of biology for varied audiences: an overview of RCSB PDB resources and experiences. J. Appl. Crystallogr. 43, 1224–1229 (2010). | Article | PubMed | ISI | ChemPort |
  11. Pearson, W.R. & Lipman, D.J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988). | Article | PubMed | ChemPort |
  12. Terpe, K. Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl. Microbiol. Biotechnol. 60, 523–533 (2003). | PubMed | ISI | ChemPort |
  13. Sigrist, C.J. et al. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 38, D161–D166 (2010). | Article | PubMed | ISI | ChemPort |
  14. Rost, B. Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94 (1999). | Article | PubMed | ISI | ChemPort |
  15. Prlić, A. et al. Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26, 2983–2985 (2010). | Article | PubMed | ChemPort |
  16. Rose, P.W. et al. The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res. 39, D392–D401 (2011). | Article | PubMed | ISI |

© 2011 Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.