A visit to
the human genome
at the Sanger Centre (use GAPDH as an example).
Go to PRC's 'home
page'.
Use the link at the
bottom to go to 'Sanger'.
Go to 'Ensemble Genome Data Resources, Human'.
At top right 'Search', select 'Gene' from drop down list, insert
'GAPDH' in the box, hit 'Go'. [The correct abbreviation is GAPD,
but GAPDH works too.]
We find 4 genes. The first is testis-specific, the second in
liver - which we will choose.
Go to 'ENSG00000111640'.
Under 'Description', go to 'EC
1.2.1.12.' You should see biochemical pathways illustrating
the role of the enzyme.
Go back one page (to 'GeneView').
Note 'Help' at top right (there is usually 'Help' here on most pages).
Look at 'Transcripts'. See 'DNA (contings)' with
transcripts from top and bottom strands (above and below). Note
directions, introns, exons, position in base-pairs from the telomere.
Go to 'Features' at the top of the yellow 'Transcript' box; select
'SNPs' from drop-down list and 'close menu'; now you see the
SNPs. From 'Features' select 'Regulatory regions', and 'close
menu'; now you see the 'regulatory regions' upstream of GAPDH.
Scroll down to 'Orthologue prediction'. Note the orthologues
detected by BLAST (we will use this later). Under 'Saccharomyces
cerevisia' go to 'Align'; see how closely the two proteins are
related (in each pair, the human sequence is on top, yeast in the
middle, and identities showed by a *, similarities by :). Go back
one, and under 'Mus musculis' see the two
proteins are more closely related.
Go back one, scroll down to the bottom to see what is there, and scroll
back to the top.
In 'Genomic Location' near the top of the grey panel, go to '6,513,918-6,517,797'.
We are now in 'ContigView'.
Under 'Chromosome 12' (the '+' and '-' in the box expands or collapses
the view) note the ideogram, position of centromere, p and q arms,
telomeres, bands; GPDH is in the red rectangle in p13.31.
Under 'Overview', we have zoomed into this rectangle; note the synteny,
the contigs, and 'Ensemble genes' with GAPDH in the middle (with the
red rectangle running through it - we zoom into this rectangle in
'Basepair view'). Still in 'Overview' note how closely genes are
packed, the pseudogenes, the ncRNA.
Under 'Basepair view', we can see 'DNA contigs' with the DNA sequence
of the upper strand with its 3 reading frames above (the open green
boxes in the 'cDNA' track shows we are in an intron) + the DNA sequence
of the lower strand with its 3 reading frames below. On the upper
strand (coding GAPDH), note the red boxes with stars marking stop
codons in all three frames - this is typical of introns. [Only
~10% of a typical gene is exonic.]
At the top right of the light yellow 'Basepair view' panel, click on
'Window
>'; now we move to an intron/exon junction (open green changes to
filled
green boxes under 'cDNAs'). The 1st reading frame in the top
strand encodes GAPDH. Note the asp or D at the intron/exon
junction must be encoded partly by the previous exon. Look at the
next leu (L) that is encoded by TTG in the DNA and UUG in the RNA;
reflect on which DNA strand the RNA polymerase used as a
template. Note the next two vals (VV) are encoded by GTC and GTA
(reflect on the redundant code and wobble position). Note the
sequence (TAGA) typical of a splice acceptor site. Note there are
no stop codons in the 2nd reading frame on the upper strand (so this
could be the coding frame).
Click on 'Window >'. Now we see two stop codons in the 2nd
reading frame in the exonic region (so this can't be the coding
frame). In the first reading frame,
we don't get a stop codon until we are out of the exon. Note the
splice donor site (GTGAGT at the beginning of the intron).
Scroll down to see the 'Restriction sites' and the 'Tilepath'.
Scroll up and look at 'Detailed view' (expand if necessary).
Click on 'Features' at the top left of the light yellow panel to get an
idea
of what is there (but don't check any of the boxes, otherwise some of
us will get lost!). Click on all the other down arrows (but don't
check!) in turn.
Scroll up to the top.
Now we will run a BLAST search.
In the yellow panel at the left, click on 'Export sequence as FASTA',
select HTML as 'output format', and click 'Continue'; you should see
the
sequence of this region of chromosome 12. Select from the
beginning to include slightly less that half of the top line.
Copy (control C). Look in the yellow panel at the left, go to
'Use Ensemble to...' and click on 'Run a BLAST search'.
You are now in 'BlastView' at the 'Setup' stage. Paste (Control
V) the copied sequence into the window under 'Either'. Select the
database as 'Homo_sapiens' and dna database (obviously not
'peptide'. Select the search tool as 'BLASTN' (the N is for
nucleic acid). In 'Search sensitivity' select 'Near-exact
matches' (reflect on what the others in the pull-down menu will
do). Click on Run. Your job has been sent off to Cambridge
and queued on their server. Your sequence will be 'BLASTED'
against the human genome; reflect what
happens. The results are usually sent back pretty quickly, and
can be recovered by pressing 'Retrieve'. Don't keep pressing
'Retrieve', as this makes it difficult for you to refind us later if
you
get lost. While waiting, reflect on what is in the dark yellow
panel at the right (especially under 'configure').
While waiting, we will go and look at OMIM (good for disease
genes). Open up a new browser window, go to www.google.com, type in 'OMIM', go there, and you
should see the home page of 'Online Mendelian Inheritance in Man' at
the NCBI. In passing, note all the resources in the black bar at
the top. In the search box, type in 'GAPDH' and 'Go'. Click
on the first (ie *13840) and you should get a good summary of
GAPDH. Click on 'Gene map locus'; you see the genes on one side
of GAPDH with their associated disorders. Click on the
location opposite GAPDH at top left, and you go to the NCBI Map
viewer. [This has different strengths/weaknesses from the one at
the Sanger Centre.]
Go back to the OMIM home page, type in a disease of
interest to you or your family (eg alzheimer), and have a look.
When finished, close this browser window.
We now return to get the results of your BLAST search. Return to
'BlastView', press 'Retrieve'. We have now progressed to the
'Results' window, and you should have some alignments and hits.
Go to 'view'; in the 'Display' window you see the hits throughout the
genome. One should be on the short arm of chromosome 12!
Scroll down to the bottom, and you should see all the hits. The
one at the top has 100% ID (identity) over all nucleotides selected
(as it should); the ones below have shorter matches.
Go back and back until you reach 'ContigView' showing chromosome 12 and
GAPDH (which is in a gene-rich region of the genome). Note in the
'Overview' window' the rough gene density (ie number of genes/Mbp), as
we will now compare this
region with a gene-poor region. In the 'Chromosome 12' ideogram,
click on q12 near the centromere; this recentres the
view. Under 'Detailed view', zoom out the most; under
'overview', you don't see many genes. Under 'Detailed View',
click
on the '-' next to 'Zoom' to zoom out even further (the 'Detailed view'
collapses') but you still don't see many genes in 'Overview'.
Repeat, and you will see this is a very gene poor region. Move
the window (using '< Window') towards the centromere, and you will
see the contigs disappear; the sequence is incomplete (the repeats in
and around the centromere prevent contig building).
Have a look at the same region in different viewers; in 'ContigView',
scroll to the top, and (in the yellow panel on the left) go to 'View
region in NCBI browser' or '... in UCSC browser'.
Enough!