The Laboratory Mouse as a Model System for Human Biology and Disease: An Introduction to the Mouse Genome Informatics (MGI) Database
The community model organism database for the laboratory mouse, the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org), integrates diverse genetic and genomic data to facilitate the use of the mouse as a model system for understanding human biology and disease. MGI contains a comprehensive catalog of mouse genes and other genome features that are integrated with manually curated information from the scientific literature and other informatics resources about developmental gene expression, genome variation, gene function, phenotype associations, pathway membership, mammalian orthology and associations with human disease genes. MGI is a unique resource of information on the relationship of mouse models (that is, allelic combinations on specific genetic backgrounds) to human disease.
The laboratory mouse is the premier animal model for understanding the genetic and molecular basis of human biology and disease1. Providing access to genetic and genomic data to support the use of the laboratory mouse as a model organism is one of the primary goals of the Mouse Genome Informatics (MGI) database resource2. To achieve this goal, MGI maintains a comprehensive catalog of mouse genes and other genome features and associates these features with orthologous genes in other mammals, human diseases, functional annotation, mouse phenotype descriptions, DNA and protein sequence data, genome scale variation (single nucleotide polymorphisms or SNPs), and developmental gene expression information. The data in MGI are obtained through manual curation of the biomedical literature, direct contributions from investigators' laboratories, and cooperative data exchange with other major informatics resource centers (for example, NCBI, Ensembl, UniProt). A summary of the current data content of MGI is shown in Table 1.
The MGI database is updated daily; search results shown in this tutorial may differ from those obtained at a later date.
The application of nomenclature and bio-ontology standards in MGI is critical to MGI's strategies for data integration and access3. MGI, the authoritative source of names and symbols for mouse genes and strains, enforces the rules for nomenclature determined by the International Committee on Standardized Genetic Nomenclature for Mouse (http://www.informatics.jax.org/mgihome/nomen/index.shtml). Members of the MGI consortium also play key roles in developing and applying biological ontologies such as the Gene Ontology4, the Mammalian Phenotype Ontology 5 and the mouse Anatomical Dictionary6, to knowledge representation.
MGI contains a plethora of data about the genetics and genomics of the laboratory mouse; the focus of this tutorial is to describe strategies for using MGI to obtain information about mouse models of human disease and the availability of mouse resources using gene-centric and phenotype-centric database searches.
The home page for the MGI database provides numerous options for accessing data and information, including a Quick Search keyword search tool, links to data portals for the primary data types in MGI, and a navigation bar with links to reports, data submission forms, and other information about the database and the MGI consortium [Figure 1].
Screenshot of the Mouse Genome Informatics (MGI) home page (http://www.informatics.jax.org). The home page provides three primary access paths to MGI's data content: (a) the navigation bar with links to reports, search pages, analysis tools and information; (b) the Quick Search tool; and (c) the data content portals.
Searches using gene symbols or names are common ways to access information in MGI. There are two primary interfaces for such searches, the Quick Search tool and the Genes and Markers Query Form. Although MGI enforces official gene nomenclature standards, past symbols and synonyms for genes are maintained in the database as part of a gene's history. Searches using unofficial symbols and names are supported in MGI and will return results that display the current, official nomenclature.
The Quick Search tool is available on the MGI home page and on the top right-hand corner of most other pages at the website. Figure 2 shows the results for a search using the keyword, "caveolin.' The results for this search list all matching items in MGI ranked by relevance. The first block of search results are for all genes and other genome features in the mouse genome that match the keyword. The second block of search results displays matches of the keyword with terms in the various vocabularies and ontologies that are used to standardize knowledge representation in MGI. Entries in the search results are hypertext linked to detailed information in MGI.
Results from a search of the keyword, caveolin, using the MGI Quick Search tool. Section a shows genes and other genome features that best matched the search term. Section b shows terms in controlled vocabularies and ontologies used to standardize data descriptions in MGI that best matched the search term.
A second approach to gene-centric searching at MGI is through the Genes and Markers Query Form (available in the navigation bar via Search → Genes → Genes and Markers Query or via the link http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=markerQF). Data integration at MGI allows diverse data from different studies about a gene to be related in new contexts according to the interests of an individual investigator. The Genes and Markers Query Form allows researchers to retrieve records for specific genes or groups of genes by shared biological properties. For example, Figure 3 illustrates the use of the Genes and Markers Query Form to search for all mouse genes that are mapped to chromosome 6, have been annotated (using Gene Ontology terms) as being involved in transcription factor activity and are associated with the disease term, lung cancer. Although each of these attributes (location, function, disease association) has been determined in different studies and genome analyses, they can be combined into a single search command using the MGI web query forms (Figure 4). One of the genes returned with the query above is the Kras (MGI: 96680) oncogene. Although Kras does not itself function as a transcription factor, it has been annotated to the biological process term "positive regulation of NF-κB transcription factor activity' in the Gene Ontology. By default, all three ontologies (Molecular Function, Biological Process and Cellular Component) are searched. Users can refine their queries by choosing to search only one of the ontology categories using the selection boxes on the query form (Figure 3).
Screenshot of the Genes and Marker Query Form in MGI highlighting the construction of a query for all genes on chromosome 6 that are transcription factors and that are associated with lung cancer. The web-based query forms in MGI allow users to search the database for genes and other genome features that share multiple criteria.
Results from a search for "All genes on chromosome 6 that are transcription factors and associated with lung cancer.'
Gene detail pages
Search results in MGI are linked to detail pages in MGI that provide an integrated overview of important genomic and biological data for each gene (or other genome feature), including genome location, gene structure, functional annotation, developmental stage and tissue expression, mammalian orthology, and phenotype annotation. An example of a Gene Detail page for the caveolin 1 gene (Cav1; MGI: 102709) is shown in Figure 5. Although there is a wealth of information provided on each gene detail page, the section most relevant for finding phenotype and disease association for mouse genes is entitled "Alleles and phenotypes.' This section provides a high-level summary of the alleles (if any) that have been reported for a gene, a brief synopsis of the phenotype of mice that have mutations in the gene, and curated associations to human disease terms, if appropriate. The high-level information is linked to additional information about genotype-to-phenotype associations. In the case of the caveolin 1 gene, following the link for "All alleles' shows the four alleles currently reported in the literature. Only the targeted allele (tm1) on a specific genetic background from the laboratory of Michael P. Lisanti (Mls) (Cav1tm1Mls; MGI: 2180364) has been described as a model for breast cancer (Figure 6). This example highlights that the same or similar alleles on different genetic backgrounds can result in different phenotypes. Genotype-specific differences can greatly influence the utility of mutant mice as models for human biology and disease processes. The links on the "All alleles' summary page shown in Figure 6 take users to in-detail pages about the mutation and the phenotypes associated with a particular allele.
Screenshot of the Gene Detail page for the caveolin 1 gene (Cav1; MGI: 102709) highlighting the "Alleles and phenotypes' section of the report. This section shows the number and types of alleles for a gene in the mouse as well as any associations of the genes and alleles to human disease conditions.
An example of an allele detail page in MGI, which lists observed phenotypes for allelic combinations on different genetic backgrounds.
Searches by phenotype and disease terms
As described above for searches by gene names and symbols, keyword searches using the Quick Search tool can be used to find information in MGI related to phenotypes and diseases. Researchers can also search MGI for sets of genes and alleles that share such characteristics as genomic location, allele type, and/or association to phenotype and disease terms using the Phenotypes, Alleles, and Disease Query Form (available in the navigation bar via Search → Phenotypes → Phenotypes, Alleles, and Disease Query Form or via http://www.informatics.jax.org/searches/allele_form.shtml). One example of a search that can be constructed using the Phenotypes, Alleles, and Disease Query Form is "Return all genes on mouse chromosome 1 that have floxed alleles and have curated associations with the Mammalian Phenotype Ontology term "abnormal eye morphology' (MP: 0002092) (Figures 7 and 8).
Screenshot of the Phenotypes, Alleles & Disease Models Query Form in MGI, highlighting the fields used to search for all genes on chromosome 1 with floxed alleles that are associated with the Mammalian Phenotype term, "abnormal eye morphology.'
Results from a search for genes on mouse chromosome 1 that have floxed alleles and are associated with abnormal eye morphology.
MGI also supports searches using human disease terms from the Online Mendelian Inheritance in Man7; http://www.ncbi.nlm.nih.gov/omim) database via the Human Disease Vocabulary Browser (available in the navigation bar via Search → Phenotypes → Human Disease Vocabulary Browser or via http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=omimVocab&subset=A). Users can enter an OMIM disease term or browse the disease term vocabulary to find a human disease of interest. If there are mouse models reported for the human disease, this is indicated in the search results. The results of a search for the disease term "retinitis pigmentosa' is shown in Figure 9. According to OMIM, retinitis pigmentosa (RP) has several distinct disease subclasses. Each of these subclasses is listed separately and has a different OMIM database identifier. Mouse models in MGI are associated to disease terms in OMIM as specifically as possible according to what is stated by an author in a publication.
Screenshot of the Human Disease Vocabulary Browser at MGI with results for a search using the term "retinitis pigmentosa.' Diseases for which mouse models have been reported are indicated on the search results.
The details for mouse models of RP (OMIM: 268000) are displayed on the MGI Human Disease and Mouse Model Detail pages (Figures 10 and 11). In the case of RP there is one gene, cyclic nucleotide–gated channel β1 (Cngb1; MGI: 2664102) for which mutations in the mouse gene and its human ortholog both result in an RP disease phenotype (Figure 10). In mouse there are also seven genes (Pde6a, Pde6b, Pde6g, Prph2, Rho, Rpe65, Slc6a6) associated with an RP phenotype for which a disease association has not yet been reported in humans. This category of genes represents potential new candidate genes for RP in humans. Conversely, there are six genes (C2orf71, CRX, MERTK, RLBP1, SPATA7, USH2A) that have been reported to be associated with RP in humans that have not yet been associated with this disease phenotype in mouse. This category of genes represents potential research opportunities for the development of new mouse models for RP. Farther down on the Human Disease and Mouse Model Detail page a list is provided of all of the mouse models for a specific human disease that have been reported in the scientific literature (Figure 11). Each model includes the allelic composition, genetic background and link to the original report in the literature describing the evidence for how the strain serves as a model for the disease.
Example of the report showing genes associated with retinitis pigmentosa (RP; OMIM: 268000). (a) Orthologous genes in mouse and human where mutations in the gene result in an RP disease phenotype. (b) Genes in the mouse where mutations result in an RP disease phenotype but where the human gene is not associated with RP. (c) Genes in the human where mutations result in RP but where the mouse gene has not been associated with RP.
Example of the MGI report for mouse models associated with RP (OMIM: 268000). For each mouse model, the allelic composition and genetic background are reported along with a link to the published reference that describes the mouse model.
MGI Batch Query tool
Often a researcher is starting the search for phenotype and disease associations using a list of genes generated from a genetic linkage analysis, genome-wide association study or microarray experiment. The MGI Batch Query tool (available in the navigation bar via Search → Genes → Genes and Markers Query or via the link http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=batchQF) allows researchers to search the database using a list of gene identifiers including gene symbols, Affymetrix probeset identifiers and GenBank sequence accession identifiers (Figure 12). Researchers can select different options for data to be returned from a batch query including genome location, Mammalian Phenotype Ontology terms and OMIM disease terms. Batch Query results can be viewed as a web page or a text file, or downloaded as an Excel spreadsheet.
Screenshot of the results for the Batch Query tool at MGI showing the human disease annotations for a list of uploaded mouse gene symbols. (a) Users can upload many different kinds of gene identifiers into the Batch Query tool and ask for numerous gene attributes and additional annotations as output. (b) The results of a search for gene nomenclature and human disease terms from OMIM for a list of mouse genes.
Finding mouse resources
MGI supports access to information about the location and availability of mouse strains and cell lines through its integration with the International Mouse Strain Resource8. IMSR is a centralized database of mouse strains and cell lines distributed by numerous repositories from around the world. From the IMSR website researchers can generate lists of available resources and link directly to individual repository websites to find details on availability and cost. From within MGI researchers can access the IMSR from links on allele detail pages. For example, the "Find Mice' section of the allele detail page for a targeted allele of the Cav1 gene (Cav1tm1Krc) is shown in Figure 13. The links from this page indicate that there are no mice or cell lines in any of the IMSR repositories that carry the specific Cav1tm1Krc allele; there are, however, two strains available that have different mutations in the Cav1 gene.
Screenshot of the allele detail page for the targeted allele of the caveolin 1 gene (Cav1tm1Krc ), with highlights of links to the International Mouse Strain Resource (IMSR). (a) The IMSR home page can be accessed from the "Find Mice' link in the navigation bar that appears on all MGI web pages. (b) Links to mouse strains or cell lines for a specific allele are provided in the "Find Mice' section of the allele detail page.
Users can also query IMSR directly by following the "Find Mice (IMSR)' link in the navigation banner that appears on every page at the MGI website. This link takes the user to the IMSR home page (http://www.findmice.org/), from which the IMSR search form can be accessed. The IMSR search form allows researchers to query for mouse strains and cell lines by such criteria as strain name, strain type (congenic, inbred, recombinant congenic, etc.), gene or allele symbol, mutation type (deletion, chemically induced, targeted, gene trap, etc.) and contributing repository.
Conclusions and Future Directions
Identifying appropriate mouse models of human disease is one of the central goals of the MGI community database. To achieve this goal, MGI integrates a wealth of genetic and genomic data for the laboratory mouse and uses a combination of data standards and expert manual curation to link these data to human genes and diseases. Thus far MGI has relied on the OMIM database as the main source of human disease terms and information. Future directions for enhancing MGI as a resource for mouse-human comparative biology include (i) the expansion of human diseases and disease terms that are associated with mouse genes and genotypes and (ii) the implementation of chromosome maps that support seamless navigation between the conserved syntenic regions of the mouse and human genomes.
The MGI database is maintained by a consortium of principal investigators including the author (C.J.B.), Judith A. Blake, Janan T. Eppig, James A. Kadin, Martin Ringwald, and Joel E. Richardson. MGI is supported by NIH HG00330, HG003622, HD033745, HG002273 and CA089713. The contributions of the administrative, software and curation teams at MGI are gratefully acknowledged.
1The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine 04609, USA
*Address correspondence to: firstname.lastname@example.org
- Rosenthal, N. & Brown, S. The mouse ascending: perspectives for human-disease models. Nat. Cell Biotechnol. 9, 993–999 (2007)
- Bult, C.J. et al. The Mouse Genome Database: enhancements and updates. Nucleic Acids Res. 38 (Database Issue), D586–D592 (2010)
- Blake, J.A. & Bult, C.J. Beyond the data deluge: data integration and bio-ontologies. J. Biomed. Inform. 21, 314–320 (2006)
- Gene Ontology Consortium The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 38 (Database Issue), D331–D335 (2010)
- Smith, C.L. & Eppig, J.T. The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis. Wiley Interdiscip. Rev. Syst. Biol. Med. 1, 390–399 (2009)
- Hayamizu, T.F. et al. The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data. Genome Biol. 6, R29 (2005) | Article | PubMed |
- Amberger, J. et al. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37 (Database Issue), D793–D796 (2009) | Article | PubMed | ChemPort |
- Eppig, J.T. & Strivens, M. Finding a mouse: the International Mouse Strain Resource (IMSR). Trends Genet. 15, 81–82 (1999) | Article | PubMed | ISI | ChemPort |