Transcription factories in a Hela cell [from Cook PR (1999) Science 284, 1790]

Nuclear Structure and Function Research Group

Peter R Cook's reading lists, etc

based on his book 'Principles of Nuclear Structure and Function'

Book / Writing tools / Resources / Using a genome browser

A visit to the genome browser at UCSC

During this workshop, you will sit at a computer and be introduced to the UCSC browser (using a human gene as an example); you will be shown how to add tracks showing the positions of your own PCR primers, clones, and results of RNA-seq/ChIP-seq experiments.

1. Some housekeeping.
(i) As we want to come back to this text, open another ('working') page:
• Open a duplicate tab: right-click on this link - PRC's 'home page' - and select 'open in a new window' (or similar).
• Arrange windows, with the new window on the top half, and both filling the width of the monitor; pull down the top window as far as 'A visit to ...' on the bottom one (so the top window occupies about two-thirds of the height).
• In the new (top) window, use the link at bottom-right to go to 'UCSC', and then the link at top-left to 'Genome browser'.
(ii) The browser remembers the previous view used on the computer, so to ensure a common view in the class:
• Check under 'assembly' at the top, and make sure 'Feb 2009 (GRCH37/hg19)' is shown (otherwise select it from the pull-down menu).
(iii) We all want to use exactly the same gene model at all stages throughout the workshop. So, we will use exactly the same search pattern throughout: in the 'search term' box type 'SAMD4', select 'SAMD4A' from the drop-down menu, and 'submit'; this should open the browser window. [Please always use this search sequence in what follows; if you are presented with lots of gene models of SAMD4A, you have used a slightly different sequence - so please go back and try again!]
(iv) Orientation:
From the top down we have:
• A row of buttons allowing you to move the window.
• A search box.
• An ideogram of chr 14 (window highlighted in red).
• In the main window below, you should see 'tracks' showing maps of SAMD4A.
• To the left of each track, you will find a grey rectangle; hover over it, and the track is highlighted.
• Immediately below the main window, find a row of grey buttons ('track search' is on the left).
• Below these are rows of white-on-blue headings, which you can toggle open or closed ('+' and '-').
(v) Again to maintain a common view:
• Scroll down to the bottom of the main window, and select 'default tracks' from the left of the row of grey buttons, followed by 'refresh' from the right of the row.
• Repeat by selecting 'default order'.
(vi) As we progress, I'll show you different ways of doing the same thing in order to illustrate the possibilities, and I will include some deliberate (illustrative) errors; please bear with me!

2. Moving around the promoter and the first exon.
(i) Move '<<'; now the promoter of SAMD4A should be in the middle. Note associated features (histone mark, DNase sensitivity, conservation).
(ii) Zoom in to the promoter by typing in the search box 'chr14:55034000 +10000'; you should see the 5'UTR, exon 1 and the beginning of intron 1.
(iii) Hover over the middle of the 5'UTR, press the shift key and keep it down, use the cross-hairs to select up to the middle of exon 1; you should zoom in.
(iv) Select 'base' from the row of grey buttons at the top; you should now see the base sequence in the main window.
(v) Scroll down below the main window, go to 'Mapping and Sequencing', and toggle open ('+') the options.
(vi) In the 'Base Position' window (first on the left) select 'full', and 'refresh'. You should now see (below the base sequence) the amino acid sequence in the three reading frames, and the 'met' and 'ATG' at the beginning of the ORF (at the top, beginning MMFRD...).
(vii) Move to the right '>>>' (moves almost a whole window to the right); see the stop codons in the other reading frames. If needed, go to the right again until you get to the end of the first exon, and see the GT splice-donor site at the beginning of intron 1 (which has stop codons in all frames).

3. Getting sequence.
Imagine you want to get the sequence in the current window.
(i) Go to the 'View' tab at top right, and select 'DNA' from the pull-down menu.
(ii) Look at the options available, and hit 'get DNA' at lower left.
(iii) You should see how boring sequences are!
(iv) Go back to the main browser window.

4. Let's open another 'track' showing histone marks.
(i) Search for 'SAMD4A' as you did before, and zoom out '3x' (top right). You should see SAMD4A in the middle, and GCH1 to the right (arrowheads show it's on the other strand).
(ii) Under the blue row labelled 'Regulation', find 'ENC Histone'; select 'show', and then 'refresh'. See how some marks pick out promoters.
(iii) Now 'hide' the track(s) you have just added (go to 'ENC Histone' select 'hide', hit 'refresh'). [Come back on your own later, open up some of the other hidden tracks, and see what there is!]

5. Searching for a sequence (perhaps that of a PCR primer you made).
(i) Select 'Tools' from the tab at the very top of the page, then 'BLAT' from the drop-down menu.
(ii) Copy in 'CTTATACCACATATACTAGA', and 'submit'; BLAT returns a single hit.
(iii) Hit 'browser', and you should see your sequence (labeled 'YourSeq') as a black line filling the width of the window.
(iv) Search for SAMD4A as you did before, and you can see the sequence maps to the 5' end of intron 1.

6. Writing code, and adding your own track (perhaps showing positions of two clones you made).
(i) Below the main window you will see a row of grey buttons; select 'add custom tracks'. [If this button has been hit recently, it is renamed 'manage custom tracks'.]
(ii) Now we'll make a 'bed' file that we will paste in. Imagine your two clones have sequences at positions 55050000-55051000 and 55052000-55055000 in SAMD4A.
(iii) Open up 'Notepad' on your computer (press the 'Windows' button, select 'All programs', 'Accessories', 'Notepad').
(iv) Copy 'chr14tab55050000tab55051000hardreturnchr14tab55052000tab55055000', and delete (i) 'tab' and replace with a tab, and (ii) 'hardreturn' and replace with a hardreturn. This should give 2 rows of 3 columns, tab-delimited. It should look like:
chr14 55050000 55051000
chr14 55052000 55055000
(v) This basic format is used by many sequence databases: chromosome number, starting base, ending base. It is the minimum required for .bed files, which usually have additional columns.
(vi) Paste into the appropriate box in the browser, hit 'submit', and this takes you to a page called 'Manage Custom Tracks'.
(vii) Open 'User Track', and in the 'Edit configuration' box call your 'track name' 'My clones' and its 'description' 'My two clones'; hit 'submit' and then 'go to genome browser'.
(viii) Enter SAMD4 into the search box, select SAMD4A from the drop down menu, hit 'go', and you should see that your two clones are from the first intron.

7. Add restriction sites in/around these clones.
(i) Under the main window, you'll find a blue row called 'Mapping and Sequencing'; open ('+') this section, go to the bottom right, and hit 'Restr Enzymes' above the pull-down menu; this opens 'Restr Enzymes Track Settings'.
(ii) Under 'Display mode' select 'full', in 'Filter display by enzymes' type in 'hindiii', and 'Submit'.
(iii) Enter SAMD4 into the search box again, select SAMD4A from the drop down menu, hit 'go'; now you can see the cut sites in/around your clones.
(iv) The display is getting overloaded, so go to the blue row 'Mapping and Sequencing', then to the bottom right of this section, select 'squish' from the pull-down menu, and hit 'refresh' to see a compact view.
(v) Now, let's add another enzyme. In the main window, right click on the grey rectangle to the left of the track named 'Restriction enzymes from REBASE', choose 'Configure Rest Enzymes', add 'ecori' after a comma (giving 'hindiii,ecori'), and hit 'OK'. Now you can see where both enzymes cut.
(vi) Get rid of this track by right-clicking on the vertical bar to the left, and selecting 'hide'.

8. Hiding and moving tracks.
(i) You can right-click most vertical bars on the left to reconfigure tracks. For example, I can find it irritating to have the main window filled with all the different versions of the 'UCSC Genes' and 'RefSeq Genes'; you can get rid of the first set by right clicking on the grey vertical bar on the left and selecting 'hide', and collapse the second set into one row by selecting 'dense'.
(ii) You can move tracks up or down by main window by right-clicking on the grey bar on the left and dragging (try inverting the last two tracks).

9. Printing views. Go to the menus at the top, 'View', select 'PDF/PS', and follow directions...

10. Adding your own tracks. Now imagine you have done a ChIP-seq/RNA-seq/3C-seq experiment, and you want to see the output from the sequencer and bioinformatics pipeline (which is often in the form of a .bed file). I'll use results from a 3C-seq experiment (using the SAMD4A promoter as a viewpoint, showing contacts it makes with all other regions of the genome); you could do exactly the same with results from the other approaches.
(i) The reads from the sequencer have been mapped to the genome, and they are in a .txt file at
http://users.path.ox.ac.uk/~pcook/students/browser/Samd4a_30.txt
This file originally had a .bed extension, but I just renamed this to .txt to facilitate things here.
(ii) Copy the link to this .txt file.
(iii) Let's look at this file.
First, open up a new tab in your browser ('Ctrl K').
Second, paste the link to this .txt file into the address/URL bar, and hit 'Return'.
You should now see that the header, and the first two lines of this file are:
'track name=Samd4a_30 description="reads per million" type=bedGraph
chr1 166404 167184 11
chr1 554538 554780 1'.
This is telling you that the file is a 'bedGraph' - a stripped down variant of a .bed file in which columns 1-3 are as before, with the 'dataValue' (here, the number of reads).
(iv) Now go to the main browser window, select 'manage custom tracks' from the row just below the bottom of the main window (it was originally called 'add custom tracks'), go to 'add custom tracks', and you will be in a window with two main panels.
(v) Paste the URL to the .txt file (not the whole list!) into the upper-most panel, and 'Submit'.
(vi) You will see an error report! Look at it - because you will probably see similar things in many other situations! It is telling you that some mappings lie beyond the end of a chromosome, which is symptomatic of using the wrong genome assembly (the file you downloaded used assembly hg18, but the browser is using hg19, and the two assemblies are quite different!). So we must rectify this.
(vii) At the top of this window with the error report (called 'Add custom tracks'), select 'Mammal', 'human', and 'hg18'; then paste in the link again, 'Submit', and (when transferred) 'go to genome browser'.
(viii) The system automatically takes you to the first read found on the first chromosome, and this is shown under a new track at the top labeled 'Samd4a_30' that fills the window width.
(ix) Zoom out 100-fold, and you will see it is a lonely read (presumably representing a rare contact).
(x) Now type SAMD4 in the search window, select 'SAMD4A' from the drop-down menu and 'go'; this should take you back to the whole of SAMD4A on chromosome 14.
(xi) In our added 'Samd4a_30' track, you can see a black rectangle at the very left (indicating many reads in this region), with grey rectangles to the right (indicating fewer reads). [The imported .bed file has an extra tabbed column compared to the one you typed, and this tells the browser to shade lines according to read number.]
(xii) Zoom out 10x, and you can see many reads at the SAMD4A promoter, some in/around the long gene, and fewer flanking it (reflecting local contacts made within this chromatin domain).
(xiii) Now we'll reconfigure this track. Go to the vertical side bar on the left, right-click, and select 'full'; you see a big spike at the SAMD4A promoter (and almost nothing anywhere else).
(xiv) Go back to the vertical bar, right-click, and select 'Configure'; this opens a box in which you can set 'vertical range setting' to a 'min' of '0' and a 'max' of '1000', and 'data view scaling' to 'use vertical viewing range setting'.
(xv) Once you hit 'OK', significant contacts in/around SAMD4A are revealed (previously hidden due to choice of scale). [Again, you can change the display of most tracks in this way.]

11. Marrying data from different assemblies. Go to 'Tools' in the menu at the top, select 'LiftOver', and choose the appropriate options.

12. Adding data from an RNA-seq experiment. Imagine you have done a stranded RNA-seq experiment, you have a .bed file of the output, and you want to see where the + and - reads map.
(i) As before, sequencer reads have been mapped to the genome, and they are in a .txt file at:
http://users.path.ox.ac.uk/~pcook/students/browser/ShortRNAsAroundSAMD4A.txt
(ii) Copy this link.
(ii) Let's look at this file. As before:
First, open up a new tab in your browser ('Ctrl K').
Second, paste the link to this .txt file into the address/URL bar, and hit 'Return'.
You should now see that the header, and the first two lines of this file are:
'track name="ColoredStrand" description="Colored strand" visibility=2 colorByStrand="255,0,0 0,0,255"
chr14 53001640 53001690 pos 0 +
chr14 53004644 53004694 pos 0 +'.
The browser is treating this file as a .bed file using the 'colorByStrand' attribute. Columns 1-3 are as before, with additional columns telling the browser to color the reads red and blue according to strand.
(iv) The link to this file should still be in memory, so go back to the main browser window, select 'Manage custom tracks', then to 'add custom tracks', and paste the link into the upper-most panel, 'Submit', and (when transferred) 'go to genome browser'.
(v) Once again, the system automatically takes you to the first read found on the first chromosome, and this is shown under a new track labeled 'ColoredStrand' that fills the width. [In this case, the first hit is on chromosome 14, because I selected sequences from just one part of this chromosome to give a small file.]
(vi) Now go to SAMD4A as before (type SAMD4 in the search box, select SAMD4A from the drop-down menu, and 'go'); zoom out 10-fold. You can see SAMD4A is covered with red hits (from the top strand). Here the browser is reading a line in the file that tells it to colour the + and - reads differently. [Toggle between 'dense' and 'full' views using the vertical grey bar on the left of the main window; end up in 'dense' to see the compact view.]

13. Help. This is page specific. For example, for help on file formats that can be accepted when loading your own tracks, go to the 'Manage custom tracks' page by hitting the 'manage custom tracks' button, hover over the 'Help' tab at top right, and select 'Help on Custom Tracks' from the pull-down menu; there, you will find information on acceptable file types, the format that those files should have (with examples), etc...

Enough! I hope this has introduced you to the basics of moving around the browser window. You have also written code (it is not that difficult!), and taken your first steps to becoming a bioinformatician. You should now be able to import data published by others (e.g., a .bed file) and put it on the browser. However, to convert the output from a sequencer into something you can see on a browser, you must use a program like 'Galaxy' (if you have got this far here, you should be able to use it); alternatively, you could become a real pro and learn 'R'...

Top | Home | Maintained by Peter Cook |