Bioinformatics Primer
An Introduction to the Human Protein Atlas: Exploration of Protein Profiles in Human Tissues, Cells and Cell Lines
doi:10.1038/pid.2009.2
Standfirst
The Human Protein Atlas (http://www.proteinatlas.org) is a publicly available resource for exploration of protein expression profiles in human normal and diseased tissues, cells and cell lines. The profiles are retrieved through immunohistochemistry- and immunofluorescence-based methods, using validated antibodies generated in-house or provided by external collaborators. The protein expression profiles are presented on the Human Protein Atlas, both as aggregated expression values for each included cell type and as high-resolution images. The search options available on the web site allow for combined advanced searches and can be used for identification of potential biomarkers. To date, the Human Protein Atlas contains protein expression profiles for ~33% of the human protein-coding genes.
Introduction
In the post-genome era, focus has turned to the exploration of the function of molecules encoded by the genome. The current estimate of the number of protein-coding genes in the human genome is 20,000 to 21,500 (The UniProt Consortium, 2008; Hubbard et al., 2009; Clamp et al., 2007). In the Swedish Human Protein Resource (HPR) program, a high-throughput system for generation of monospecific antibodies has made it possible to explore the human proteome (Nilsson et al., 2005). The antibodies in this program are used primarily for protein profiling of human cells, tissues and cell lines, with the results provided on a publicly available web portal for the Human Protein Atlas (http://www.proteinatlas.org) (Berglund et al., 2008).
In the Human Protein Atlas, immunohistochemistry-based protein profiling conducted on 48 normal tissues, 20 cancer types and 59 cell lines and primary cells is published as high-resolution images together with aggregated information about the protein expression levels (Uhlén et al., 2005; Strömberg et al., 2007; Figure 1a). The subcellular localization of a large number of proteins, as determined by confocal microscopy using three cell lines and fluorescently labeled antibodies (Barbe et al., 2008), is also available from the Human Protein Atlas (Figure 1b). In addition to the in-house generated antibodies, antibodies collected from external collaborators have been used for protein profiling in the program. The Human Protein Atlas currently contains expression profiles for the protein products from 6,844 human genes, or ~33% of the estimated number of protein-coding genes.
Figure 1a
Expression of the potential transcription factor activity-dependent neuroprotector homeobox protein encoded by ADNP. Nuclear staining of glandular cells in human thyroid gland, as detected by immunohistochemistry using a monospecific antibody (HPA006371) to ADNP.
Figure 1b
Nuclear, but not nucleolar, localization of ADNP in the cell line U-2 OS, as detected by immunofluorescence and confocal microscopy using HPA006371.
Browsing and Search
The Human Protein Atlas is gene-centric, meaning that all human protein-coding genes defined by the Ensembl annotation system (Hubbard et al., 2009) are included. There are four ways to search the Human Protein Atlas (Figure 2a):
- Free text search. A free text search can be done from the homepage of the Human Protein Atlas by adding one, or part of one, of the following into the search box: HUGO Gene Nomenclature Committee (HGNC) gene symbol, gene description, Ensembl identifier, Ensembl gene synonym, Human Protein Atlas identifier (antibody identifier) or tissue expression annotation summary.
- Browse by chromosome. All genes localized on a specific human chromosome can be retrieved by selecting that chromosome on the homepage. The chromosome named "OTHER" will list antibodies not linked to an Ensembl gene or linked to a gene with unmapped chromosome localization.
- Browse by protein class. Lists of proteins classified based on potential function or found in certain studies (plasma proteins or candidate cancer markers, for example), have been assembled from different external sources (Berglund et al., 2008). All genes encoding these proteins can be listed by selecting one of the main classes given on the homepage, or by selecting "More..." to get a longer list of available classes.
- Advanced search. An extended search option is available through the "Advanced search" link on the homepage. This search option allows for combined searches involving all three search options given above, as well as searches on protein expression levels in normal and cancer tissue and in cells and cell lines (Björling et al., 2008; Figure 2b). Through the Advanced search, it is possible to exclude or include certain features (AND, AND NOT) in the search and to set thresholds on features.
Figure 2a
The home page of the Human Protein Atlas allows for simple free text search (1), browsing by chromosome (2), browsing by protein class (3; select "More..." for the full list) and Advanced search (4).
Figure 2b
The Advanced search page. The example shows a combined search for potential transcription factors not encoded by chromosome 13 and showing strong expression in the Leydig cells of normal testis.
Search results
The search result is displayed as a list of genes fulfilling the search criteria (Figure 3). If there is an expression profile available for a protein encoded by the listed gene, an antibody ID is displayed, where the prefix "HPA" implies an antibody generated within the HPR program and the prefix "CAB" implies an antibody retrieved from an external collaborator. One criticism of antibody-based proteomics is the possibility that part of or all of the expression patterns detected using an antibody could be caused by cross-reactivity to proteins other than the intended target. The antibodies used in the HPR program are validated by several methods, and the validation scores for the antibodies are shown in the results list. Details of the validation methods and scores are described on the quality assurance page (http://www.proteinatlas.org/qc.php).
Figure 3
The result page. There is an option to display only genes with available protein expression profiles in the result list (select "Filter: Tissue profiles only" in the upper right corner of the result page). The leftmost column of the result list is the hit number, followed by the gene name according to Ensembl (the HGNC gene symbol or a clone ID) and the description and chromosomal localization of the gene. The "Links" column provides direct links to the corresponding entry in Ensembl (E), NCBI Entrez Gene (N) and UniProt (U) when available. The "Class" column gives the short name of the protein class(es) to which the corresponding proteins belong. A mouse-over shows the full name of the protein class. Identifiers for antibodies to the proteins encoded by the gene are listed after the protein class column (CAB or HPA, or both), linking to the protein expression profiles page. The validation results for the antibody in immunohistochemistry (IH), immunofluorescence (IF; if available), western blotting (WB) and protein array (PA; only for HPA antibodies) are displayed in the rightmost column of the result list as color-coded boxes, where green indicates supportive results, yellow indicates uncertain results, and red indicates non-supportive results.
Gene, antibody and protein expression information
The gene name in the search result is linked to a web page with a summarized view of the gene (Figure 4a) and a navigation panel (Figure 4b) listing the antibodies generated to this target. There are also links to pages with the protein expression profiles in human tissues, cell lines and primary cells (Figure 4c) and to pages with information about the generated antigen and antibody (Figure 4d). The navigation panel can also be accessed through the antibody identifier in the search result. If there is more than one antibody targeting proteins from the same gene, it is possible to switch between the antibodies through the navigation panel.
Figure 4a
The Gene/Protein page displays data about the gene and the encoded proteins retrieved from Ensembl, UniProt, Gene Ontology and other external sources. For each encoded protein, there is a visualization tool displaying protein features such as antigen position in the protein sequence (A), sequence identity of the protein to proteins from other genes based on a 10- or a 50-amino acid sliding window (B and C), predicted transmembrane segment (D) or signal peptide (E), shared or exclusive regions between splice variants (F), low-complexity regions (G), InterPro regions (H) and a protein amino acid scale (I).
Figure 4b
The navigation panel for each gene can be reached through the result list by clicking on the gene name or antibody identifier. Results for multiple antibodies to the same target can be compared by switching between the listed antibodies when browsing the protein expression patterns.
Figure 4c
The Expression Profiles page starts with a short summary of the gene, followed by the expression profile in normal tissues and patient tumor samples (alphabetically or histologically sorted) based on manual annotation of immunohistochemical staining patterns. For normal tissues, samples from three different individuals are used for each tissue. For cancer patient samples, duplicate samples from 12 different individuals are used (for most cancer types). Representation of expression values is based on the quantity and density of the staining, where red represents strong expression, orange is moderate expression, yellow is weak expression, white is no expression, and black indicates missing data. Immunohistochemistry is also used for protein expression analysis in cell lines and cells, but the expression values are retrieved through automated image analysis software (Strömberg et al., 2007). Cell size-corrected expression values for the protein (Lundberg et al., 2008) are displayed in a cell expression diagram. In many cases, the list of expression profiles ends with results from immunofluorescence studies suggesting the subcellular localization of the protein.
Figure 4d
The Antigen/Antibody page, displays information such as the sequence of the antigen (Protein Epitope Signature Tag, or PrEST) used for generation of the HPA antibodies, the sequence identity of the antigen to its target protein and the two best hits to proteins from other genes, the antibody provider and other information about the antibody, and details on the validation based on immunohistochemistry, immunofluorescence, protein arrays and western blotting.
Biomarker discovery
The advanced search can be used for discovery of potential biomarkers by combined searches on protein expression values in the Human Protein Atlas (Figure 5). As an example, we performed a combined search for proteins strongly expressed in at least half (6 of 12) of the breast cancer patient samples and with less than moderate expression in normal breast tissue (Figure 5a). A list of 32 genes was retrieved (Figure 5b), where at least 6 of the genes (CEACAM5, EPHB3, PLAT, PRKCB, SP1 and TFF1) have been previously associated with cancer, as indicated by the protein classification "Cb" (candidate cancer biomarker). For example, expression of TFF1 has been shown to be induced by estrogen in the metastatic breast adenocarcinoma cell line MCF-7 (Jakowlew et al., 1984). The MCF-7 cell line also shows strong expression of TFF1 in the Human Protein Atlas (Figure 5c). Two transcription factor—coding genes (protein class "Tf"), ALX1 and SP1, are found in the results list, where SP1 is already classified as a candidate cancer biomarker. Both show stronger expression in breast tumor tissue than in normal breast tissue (Figure 5d). ALX1 has been reported to encode a transcription repressor (Gordon et al., 1996), whereas SP1 encodes a transcription activator (Jackson et al., 1988). Thus, a search for genes downregulated in breast cancer tissue compared to normal breast tissue (combining at least six patient breast samples with less than moderate expression and normal breast samples with strong expression) results in 187 hit genes and could indicate potential targets for ALX1. Similarly, some of the other genes in the result list may be regulated by SP1.
Figure 5a
Advanced search for biomarker discovery. Combined search for proteins with strong expression in at least six breast cancer patient samples and less than moderate expression in normal breast tissue.
Figure 5b
Results from the search described in Figure 5a. Thirty-two genes are found, of which at least six have previously been associated with cancer (shown as belonging to protein class Cb, candidate cancer biomarkers).
Figure 5c
Expression of TFF1 in human cell lines. The metastatic breast adenocarcinoma cell line MCF-7 is unique in showing strong expression of TFF1.
Figure 5d
Expression of transcription factors ALX1 (top) and SP1 (bottom) in normal breast tissue (left) and breast cancer tissue (right).
A simple literature search reveals that, of the other genes in the result list, AGR2, KPNA2, MRPL40, NUP214, RAB3D and SCARB2 have been associated with breast cancer in earlier studies (Harris et al., 2002; Hildebrandt et al., 1999; Sehgal et al., 1997; Sjöblom et al., 2006). AGR2, for example, encodes a protein whose coexpression with the estrogen receptor has been suggested to contribute to the hormonally responsive breast cancer phenotype (Thompson et al., 1998). The protein has been shown to bind to a GPI-anchored, metastasis-associated protein and an extracellular
-dystroglycan, which indicates a potential role in tumor metastasis through modulation of receptor adhesion and function (Fletcher et al., 2003).
The advanced search can also be used to find differentially expressed proteins among samples from patients with cancer, where expression levels can indicate prognosis and drug treatment response. For example, it is possible to search for proteins with strong expression in some breast cancer samples and no or low expression in other breast cancer samples.
The expression patterns of all genes found by the advanced search need to be investigated thoroughly in extended studies, where more patient and control samples are included, to validate any new breast cancer biomarkers. The Advanced search in the Human Protein Atlas could, however, be a first step in the quest for new biomarkers.
Conclusions
The Human Protein Atlas is a unique public resource providing protein expression profiles in a wide range of normal and diseased human tissues, cell lines and cells based on immunohistochemistry and immunofluorescence methods using validated antibodies. The expression profiles for proteins from ~33% of the human protein-coding genes can be retrieved from the web site by simple and advanced search options, allowing for new discoveries in biological and medical research.
Future directions
The first version of the Human Protein Atlas was released in 2005, containing expression profiles for the protein products of 650 human genes. The number of available expression profiles has almost doubled every year since. The latest version of the Human Protein Atlas contains protein profiles corresponding to one-third of the human protein-coding genes. The aim of the HPR program is to have a first draft of the human proteome (defined as one protein expression profile for each human protein-coding gene) by 2014.
In addition to having more protein expression profiles on the Human Protein Atlas, we plan to add information about the subcellular localization of all studied proteins, more protein classes and pathway data. We also expect to add more normal tissue types (retina, for example) and a wider representation of disease tissue (other than cancer) to the protein expression profiles.
Acknowledgments
We are most grateful to the HPR teams in Uppsala, Stockholm and Mumbai for their tremendous efforts. This work was supported by grants from the Knut and Alice Wallenberg Foundation.
1School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), 106 91 Stockholm, Sweden
2Department of Genetics and Pathology, Rudbeck Laboratory, Uppsala University, 751 85 Uppsala, Sweden
*Address correspondence to: Prof. Mathias Uhlén, Tel: +46 8 5537 8325, Fax: +46 8 5537 8482, E-mail: mathias@biotech.kth.se
References
- Barbe, L. et al. Toward a confocal subcellular atlas of the human proteome. Mol. Cell. Proteomics 7, 499–508 (2008)
- Berglund, L. et al. A genecentric Human Protein Atlas for expression profiles based on antibodies. Mol. Cell. Proteomics 7, 2019–2027 (2008) | Article | PubMed | ChemPort |
- Björling, E. et al. A web-based tool for in silico biomarker discovery based on tissue-specific protein profiles in normal and cancer tissues. Mol. Cell. Proteomics 7, 825–844 (2008)
- Clamp, M. et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. USA 104, 19428–19433 (2007) | Article | PubMed | ChemPort |
- Fletcher, G.C. et al. hAG-2 and hAG-3, human homologues of genes involved in differentiation, are associated with oestrogen receptor-positive breast tumours and interact with metastasis gene C4.4a and dystroglycan. Br. J. Cancer 88, 579–585 (2003) | Article | PubMed | ISI | OpenURL |#end | ChemPort |
- Gordon, D.F. et al. Human Cart-1: structural organization, chromosomal localization, and functional analysis of a cartilage-specific homeodomain cDNA. DNA Cell Biol. 15, 531–541 (1996) | PubMed | ChemPort |
- Harris, R.A. et al. Cluster analysis of an extensive human breast cancer cell line protein expression map database. Proteomics 2, 212–223 (2002) | Article | PubMed | ISI | ChemPort |
- Hildebrandt, T. et al. Identification of URIM, a novel gene up-regulated in metastasis. Anticancer Res. 19, 525–530 (1999)
- Hubbard, T.J. et al. Ensembl 2009. Nucleic Acids Res. 37, D690–D697 (2009)
- Jackson, S.P. & Tjian, R. O-glycosylation of eukaryotic transcription factors: implications for mechanisms of transcriptional regulation. Cell 55, 125–133 (1998)
- Jakowlew, S.B., Breathnach, R., Jeltsch, J.M., Masiakowski, P. & Chambon, P. Sequence of the pS2 mRNA induced by estrogen in the human breast cancer cell line MCF-7. Nucleic Acids Res. 12, 2861–2878 (1984) | PubMed | ISI | ChemPort |
- Lundberg, E. et al. The correlation between cellular size and protein expression levels--normalization for global protein profiling. J. Proteomics 71, 448–460 (2008)
- Nilsson, P. et al. Towards a human proteome atlas: high-throughput generation of mono-specific antibodies for tissue profiling. Proteomics 5, 4327–4337 (2005) | Article | ChemPort |
- Sehgal, A. Isolation and characterization of a novel gene from human glioblastoma multiforme tumor tissue. Int. J. Cancer 71, 565–572 (1997) | Article | PubMed | ISI | ChemPort |
- Sjöblom, T. et al. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268–274 (2006) | Article | PubMed | ISI | ChemPort |
- Strömberg, S. et al. A high-throughput strategy for protein profiling in cell microarrays using automated image analysis. Proteomics 7, 2142–2150 (2007)
- The UniProt Consortium The universal protein resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008) | Article | PubMed | ChemPort |
- Thompson, D.A. & Weigel, R.J. hAG-2, the human homologue of the Xenopus laevis cement gland gene XAG-2, is coexpressed with estrogen receptor in breast cancer cell lines. Biochem. Biophys. Res. Commun. 251, 111–116 (1998) | Article | PubMed | ISI | ChemPort |
- Uhlén, M. et al. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol. Cell. Proteomics 4, 1920–1932 (2005) | Article | PubMed | ChemPort |




