A guide to using a genome browser (Peter R Cook)

A visit to the human genome at the Sanger Centre (use GAPDH as an example).

Go to PRC's 'home page'.
Use the link at the bottom to go to 'Sanger'.
Go to 'Ensemble Genome Data Resources, Human'.
At top right 'Search', select 'Gene' from drop down list, insert 'GAPDH' in the box, hit 'Go'. [The correct abbreviation is GAPD, but GAPDH works too.]
We find 4 genes. The first is testis-specific, the second in liver - which we will choose.
Go to 'ENSG00000111640'.
Under 'Description', go to 'EC 1.2.1.12.' You should see biochemical pathways illustrating the role of the enzyme.
Go back one page (to 'GeneView'). Note 'Help' at top right (there is usually 'Help' here on most pages).
Look at 'Transcripts'. See 'DNA (contings)' with transcripts from top and bottom strands (above and below). Note directions, introns, exons, position in base-pairs from the telomere.
Go to 'Features' at the top of the yellow 'Transcript' box; select 'SNPs' from drop-down list and 'close menu'; now you see the SNPs. From 'Features' select 'Regulatory regions', and 'close menu'; now you see the 'regulatory regions' upstream of GAPDH.
Scroll down to 'Orthologue prediction'. Note the orthologues detected by BLAST (we will use this later). Under 'Saccharomyces cerevisia' go to 'Align'; see how closely the two proteins are related (in each pair, the human sequence is on top, yeast in the middle, and identities showed by a *, similarities by :). Go back one, and under 'Mus musculis' see the two proteins are more closely related.
Go back one, scroll down to the bottom to see what is there, and scroll back to the top.
In 'Genomic Location' near the top of the grey panel, go to '6,513,918-6,517,797'.
We are now in 'ContigView'.
Under 'Chromosome 12' (the '+' and '-' in the box expands or collapses the view) note the ideogram, position of centromere, p and q arms, telomeres, bands; GPDH is in the red rectangle in p13.31.
Under 'Overview', we have zoomed into this rectangle; note the synteny, the contigs, and 'Ensemble genes' with GAPDH in the middle (with the red rectangle running through it - we zoom into this rectangle in 'Basepair view'). Still in 'Overview' note how closely genes are packed, the pseudogenes, the ncRNA.
Under 'Basepair view', we can see 'DNA contigs' with the DNA sequence of the upper strand with its 3 reading frames above (the open green boxes in the 'cDNA' track shows we are in an intron) + the DNA sequence of the lower strand with its 3 reading frames below. On the upper strand (coding GAPDH), note the red boxes with stars marking stop codons in all three frames - this is typical of introns. [Only ~10% of a typical gene is exonic.]
At the top right of the light yellow 'Basepair view' panel, click on 'Window >'; now we move to an intron/exon junction (open green changes to filled green boxes under 'cDNAs'). The 1st reading frame in the top strand encodes GAPDH. Note the asp or D at the intron/exon junction must be encoded partly by the previous exon. Look at the next leu (L) that is encoded by TTG in the DNA and UUG in the RNA; reflect on which DNA strand the RNA polymerase used as a template. Note the next two vals (VV) are encoded by GTC and GTA (reflect on the redundant code and wobble position). Note the sequence (TAGA) typical of a splice acceptor site. Note there are no stop codons in the 2nd reading frame on the upper strand (so this could be the coding frame).
Click on 'Window >'. Now we see two stop codons in the 2nd reading frame in the exonic region (so this can't be the coding frame). In the first reading frame, we don't get a stop codon until we are out of the exon. Note the splice donor site (GTGAGT at the beginning of the intron).
Scroll down to see the 'Restriction sites' and the 'Tilepath'.
Scroll up and look at 'Detailed view' (expand if necessary). Click on 'Features' at the top left of the light yellow panel to get an idea of what is there (but don't check any of the boxes, otherwise some of us will get lost!). Click on all the other down arrows (but don't check!) in turn.
Scroll up to the top.
Now we will run a BLAST search.
In the yellow panel at the left, click on 'Export sequence as FASTA', select HTML as 'output format', and click 'Continue'; you should see the sequence of this region of chromosome 12. Select from the beginning to include slightly less that half of the top line. Copy (control C). Look in the yellow panel at the left, go to 'Use Ensemble to...' and click on 'Run a BLAST search'.
You are now in 'BlastView' at the 'Setup' stage. Paste (Control V) the copied sequence into the window under 'Either'. Select the database as 'Homo_sapiens' and dna database (obviously not 'peptide'. Select the search tool as 'BLASTN' (the N is for nucleic acid). In 'Search sensitivity' select 'Near-exact matches' (reflect on what the others in the pull-down menu will do). Click on Run. Your job has been sent off to Cambridge and queued on their server. Your sequence will be 'BLASTED' against the human genome; reflect what happens. The results are usually sent back pretty quickly, and can be recovered by pressing 'Retrieve'. Don't keep pressing 'Retrieve', as this makes it difficult for you to refind us later if you get lost. While waiting, reflect on what is in the dark yellow panel at the right (especially under 'configure').

While waiting, we will go and look at OMIM (good for disease genes). Open up a new browser window, go to www.google.com, type in 'OMIM', go there, and you should see the home page of 'Online Mendelian Inheritance in Man' at the NCBI. In passing, note all the resources in the black bar at the top. In the search box, type in 'GAPDH' and 'Go'. Click on the first (ie *13840) and you should get a good summary of GAPDH. Click on 'Gene map locus'; you see the genes on one side of GAPDH with their associated disorders. Click on the location opposite GAPDH at top left, and you go to the NCBI Map viewer. [This has different strengths/weaknesses from the one at the Sanger Centre.]
Go back to the OMIM home page, type in a disease of interest to you or your family (eg alzheimer), and have a look.
When finished, close this browser window.

We now return to get the results of your BLAST search. Return to 'BlastView', press 'Retrieve'. We have now progressed to the 'Results' window, and you should have some alignments and hits. Go to 'view'; in the 'Display' window you see the hits throughout the genome. One should be on the short arm of chromosome 12!
Scroll down to the bottom, and you should see all the hits. The one at the top has 100% ID (identity) over all nucleotides selected (as it should); the ones below have shorter matches.

Go back and back until you reach 'ContigView' showing chromosome 12 and GAPDH (which is in a gene-rich region of the genome). Note in the 'Overview' window' the rough gene density (ie number of genes/Mbp), as we will now compare this region with a gene-poor region. In the 'Chromosome 12' ideogram, click on q12 near the centromere; this recentres the view. Under 'Detailed view', zoom out the most; under 'overview', you don't see many genes. Under 'Detailed View', click on the '-' next to 'Zoom' to zoom out even further (the 'Detailed view' collapses') but you still don't see many genes in 'Overview'. Repeat, and you will see this is a very gene poor region. Move the window (using '< Window') towards the centromere, and you will see the contigs disappear; the sequence is incomplete (the repeats in and around the centromere prevent contig building).

Have a look at the same region in different viewers; in 'ContigView', scroll to the top, and (in the yellow panel on the left) go to 'View region in NCBI browser' or '... in UCSC browser'.

Enough!

Nuclear Structure and Function Research Group

Quick links