1. Installing Ubuntu/Linux via ‘VirtualBox’

 

[Example – AF’s computer: Windows 7 Home Basic; Intel® Core™ i9-2330M CPU @ 2.20 GHz; 4.00 GB RAM; 64-bit OS.]

 

1.1 Download and install Ubuntu

(i)      Download VirtualBox for Windows (here: v5.0.4 or latest from https://www.virtualbox.org/wiki/Downloads).

 

(ii)     Download Linux (here “Ubuntu 14.04.3 LTS; 64-bit” from http://www.ubuntu.com/download/desktop). [LTS = Long_Term_Support recommended; make a donation!]

 

(iii)    Install VirtualBox. Double-click on file downloaded under 1. Press “Run”. A wizard opens, press “Next”. Keep default settings, and as you go you may be asked to install software devices (e.g., network adapters) – install all.

 

(iv)    Open VirtualBox (might open by default), click “New” and create a virtual machine (fill in a “Name”, “Type” is Linux, “Version” is Ubuntu-64-bit); press “Next”. Allocate memory (stay in green zone). Check “Create a virtual hard-drive now”; click “Create”, check VDI, click “Next”. For ‘Storage…’, choose “dynamically allocated”, click “Next”, provide a name for the new virtual hard-disk file and select the size of the virtual hard disk, click “Create”.

 

(v)     You should automatically return to “VirtualBox Manager”. [Click (orange) “Settings”, which opens a menu. Note what can be accessed. On left: e.g., “General”, “System”, etc… On right: choices. Look around; e.g., under “Advanced” allow “DragNDrop” bidirectionality.]

          (a) Under “System”, change boot order and place Optical on top (we’ll boot from virtual optical file). Press ‘OK’.

          (b) Under “Storage”, Select “Controller: IDE”. In highlighted area, there are two pluses – click the first (“Adds optical drive”), and “Choose Disk”. Select the Ubuntu file you downloaded under 2 (here “ubuntu-14.04.3-desktop-amd64.iso”), and click “Open”. Select this file in the “Controller: IDE” tree, and press “OK”.

          (c) Start the VirtualBox by pressing “Start” (green). At “Welcome”, click on “Install Ubuntu”. You’ll be prompted to make choices – choose defaults settings. A screen “Installation type” should appear. High-stress moment! It should say: ‘This computer currently has no detected operating systems. What would you like to do?’ This means that your Virtual Box contains no operating systems (which is separate from the rest of the computer) – so you can check “Erase disk and install Ubuntu” despite the horrendous warning, click “Install Now”, and select options as appropriate.

 

(vi)    Various things may happen after installation (e.g., a request to switch off your computer). Next, open (or return to) Oracle VM Virtual Box. Click on “System” on the right, and (in the “Motherboard” tab) check boot order – put “Hard Disk” at the top. [To increase CPU number, select the “Processor” tab, and increase.]

 

(vii)   On the “Manager” screen of VirtualBox, press “Start” (green arrow). The Ubuntu Desktop opens within VirtualBox. To get a full Ubuntu screen, press tab ‘Devices’, select ‘Insert Guest Additions cd image’ (installs Virtualbox-Guest-Additions), select “Run”, provide password, press return when asked. Close Ubuntu (press the cog at top right, and select restart). Alternatively, close Ubuntu, and reopen it using the green arrow in VirtualBox – Ubuntu should now fill the VirtualBox screen.

 

(viii)  You’ve done it! Welcome to Linux/Ubuntu.

 

(ix)    Some housekeeping. You may want to share folders, or hard-drives between your host (Windows) and guest (Ubuntu) systems. To share folder(s) or hard-drive(s), create a folder in your Host system (e.g. on Desktop in Windows). Here we make a folder called “shared_folder”. Open the VirtualBox, and go to “Settings” (orange). At the lower end of the sub-menu click on “Shared Folders”. At the right side of the following pop-up window you can find a folder symbol with a green “+”. Click on it and set the Folder Path to the “shared_folder” (for example if you made the “shared_folder” on the Windows Desktop, direct the Path to the folder on Windows Desktop.  Also, tick auto-mount. When done click “Ok”.

 

1)             Start Ubuntu (press green “Start”). Once you are logged into Ubuntu we’ll make the “shared_folder” accessible to Ubuntu.

 

 

In Ubuntu, open a terminal (also called console or command line) by “Ctrl+Alt+t”. A terminal window will open. If you can’t understand the following yet, just hang on – it’ll become clear. We’ll first need to tell Ubuntu where we will be “mounting” the “shared_folder” (that we originally created in windows). We will be mounting the “shared_folder” to a directory called “?mnt/share”. The folder “mnt” already exists on our Ubuntu system, but the folder “share” needs to be created. In the terminal type (without the “$”, which in the following is just used to visualize that we are doing something in the terminal –

 

$ sudo mkdir /mnt/share

 

You’ll probably be asked for your password (type the password that you set to log into Ubuntu – while typing, the password will not be shown, keep on typing and press ENTER afterwards. If you did not do some spelling mistake, you’ll have a new folder now and the terminal will go back to the “$” sign, waiting for the next input0.  

 

Next, we’ll be mounting the “shared_folder” (made in Windows) to the “/mnt/share” folder (made in Ubuntu).

 

$ sudo mount –t vboxsf shared_folder /mnt/share

 

(If required, provide your password.)

 

With this basic setup, we are ready to turn to NGS-data analysis.

 

At the very end of this document, we’ll have a compiled (non-exhaustive) list of useful Linux commands (since they are not specifically explained throughout the data analysis steps.

 

 

 

NGS-sequencing Data in Linux/Ubuntu

 

-                 Quality control (using fastqc).

 

0)             The way we will be running the fastqc-application is used to demonstrate some principles of using a Linux/Ubuntu system. Those principles hold true for most of the other software/programs we’ll be using later on and will give you an idea how Linux/Ubuntu ticks.

 

1)             For some initial quality check of the sequencing data (Chip-seq or RNA-seq), one can use FASTQC.

 

2)             Download  here - http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Click on “FastQC v0.11.3 (Win/Linux zip file” (or more recent version of available) and Save File.

 

Unless you change the destination folder for downloads, all files will be saved in the “Download” folder. You can access that folder via the terminal (“Ctrl+Alt+t”) then type 

$ cd Downloads

 

Alternatively, click on the icon “Files” in the left toolbar, then double-click the folder “Downloads”. Copy the downloaded zip-file to the Desktop (“Ctrl x, ctrl v” works in Linux just as in Windows).

 

Once the FASTQC folder is on the Desktop, mark it, right mouse click, select “Extract here” to unpack the zip-file. In order to run fastqc in an interactive mode (displaying a user interface), one needs to make the file fastqc executable.

 

First, let’s have a look what is in the FastQC folder itself (open a terminal “Ctrl+Alt+t”) and go into the FastQC folder by typing

 

$ cd Desktop/FastQC

 

Type

 

$ ls –al

 

You will see files and directories that are within the FastQC folder.

At the very left side, you can see the so-called ‘permission bits’ (e.g. –rwxr-xr--, or drwxrwxrw-; the “d” designates something as a directory, r = read, w=write, x=execute).

 

If you take look at the file ‘fastqc’ the permission is initially set to

-rw-rw-r--.

Let’s take this apart (the colour coding is just for explanation purposes)

 

-rw-rw-r—

 

      - means that it is a file (it would say “d” if it is a directory)

          rw- means that the “user” can read and write the file, but not execute

          rw- means that “groups” can read and write the file but not execute   

r—means that “all others” can read the file, but not write to it or execute it

 

The permission bits are summed up per user/group/others

 

For the file “fastqc” the original permission is

-rw-rw-r—

 

which corresponds to the number/bit code 664 (r=4, w=2, x=1)

 

6 (r=4 + w=2)

6 (r=4 + w+2)

4 (r=4)

 

 

The following command will change the file permission and make it executable to everyone

 

$ chmod 755 fastqc

 

 

This will make the file executable. If you receive a ‘permission denied’ error retype the above command like this

 

$ sudo chmod 755 fastqc.

 

Check the permission bits of the fastqc file again and type

 

$ ls -al

 

It has now changed to –rwxr-xr-x (which equals 755 in permission bits).

 

Try to run fastqc from within the FastQC folder by typing

 

$ ./fastqc

 

That will likely return an error message, saying that it can’t exec “java”…etc… . fastqc is written in the “java” programming language, which so far is not installed on the Ubuntu distribution we initially installed (14.04.3LTS).

 

Type

 

$ java

 

You will now see, that the ‘program’ java can be found in the following packages – default-jre…,….,…., .

 

In order to install ‘java’ type

 

$ sudo apt-get install default-jre

 

Provide your password, and answer “y” when asked for using disk space for the installation.  

 

After the installation is complete, type

 

$ java –version

 

which tells you which version of java is now on your Ubuntu system.

 

Let’s try again to run fastqc.

Within the FastQC folder (in case you forgot how to get there, open a terminal with “Ctrl+Alt+t”, and type

$ cd Desktop/FastQC

and press ENTER

 

Now type

 

$ ./fastqc

 

which should start FASTQC in interactive mode.

 

You can now load your .fastq files (zipped or unzipped) into the application to check the quality of them.

 

Please note – the way we called the fastqc program was from WITHIN the FastQC folder itself. 

(user@user-VirtualBox:~/Desktop/FastQC$ ./fastqc).

 

If you’d just typed  

 

user@user-VirtualBox:~/Desktop/FastQC$ fastqc

 

that is without the “./” before the word “fastqc” it would not work.

 

Why?

 

Whatever you type in the terminal, Linux will look if it can find the command/application/program (whatever you try to run), but by default it will look in a so-called PATH (which is nothing else, than a collection of folders, that Linux will go through to look for the command/application/software).

To take a look at what the PATH is type

 

$ echo $PATH

 

 

This will show something like this

 

/usr/local/sbin:/usr/local/bin:……,……/usr/local/games

 

These are all the folders that Linux will look into (in the order from left to right), and if it finds the respective command it will execute it – otherwise, it will give you an error (The program ‘xyz’ is currently not installed, orCommand not found etc…).

 

However, if you don’t want Linux to look into the PATH, but only within your current working directory (for example within the FastQC directory), you’ll need to run the command like this –

 

user@user-VirtualBox:~/Desktop/FastQC$ ./fastqc

 

since the sign “./” will instruct Linux to only look within your current   

working directory. This also means, that currently the fastqc application  can only be executed from the FastQC folder itself.

 

To be able to run fastqc (and other applications that we will be using soon) from anywhere and not only from within the FastQC folder, there is several ways one can do that. We’ll cover two different ways (one is a transient solution, the other a permanent solution).

 

 

Transient solution:

Open a terminal (“Ctrl+Alt+t”) and go to the FastQC folder.

 

$ cd Desktop/FastQC

 

Then have a look at the PATH (echo $PATH) before and after the following command

 

$ export PATH=$PATH:$PWD

 

You will see that your fastqc folder has been added to the default search PATH that Linux is using to find commands/applications/software.

You can now call fastqc from any folder/directory. However, that will be specific to the terminal that you used to type

 

$ export PATH=$PATH:$PWD

 

 

 

 

 

Permanent solution (different options, one is mentioned here):

 

In a terminal type

$ sudo ln -s /home/rudy/Desktop/FastQC/fastqc /usr/local/bin/fastqc

 

Provide password

The word “rudy” in the above command needs to be replaced with your personal username. The command will make a symbolic link (“sudo ln –s”) from the fastqc file in your folder FastQC (“/home/rudy/Desktop/FastQC/fastqc” – which specifies which file you want to link to) and place the link into the folder “/usr/local/bin” and name it “fastqc” = /usr/local/bin/fastqc. Here, you could give the link any name, for example instead of “/usr/local/bin/fastqc” you could have specified “/usr/local/bin/cqtsaf”.  

“sudo” tells Linux that you are the user with administrator rights and therefore will always execute your commands if you provide the correct password.  

If you want to find out more about the “ln” command (used to make links) type in a terminal

 

$ ln –help

 

which will show you how commands are run and what they are useful for, also the parameters you can provide (in the above example we used “ln -s” for example).

 

Now, having done this, you can run the fastqc application from any directory and from any terminal. In case you named the link differently from “fastqc”, you’ll need to call the fastqc application with the name you gave it, for example (the above example) by typing

 

$ cqtasf