NGS workshop: demystifying data analysis
Many of us are sending off samples for ‘next-generation DNA sequencing’ (NGS), and getting the data back. This workshop is for biologists who know they should master how to analyse the resulting data – and who are complete beginners (probably with strong aversions to using a command-line and statistics – the two basic requirements). We intend to demystify the process in three steps:
(i) You will be given instructions that enable you to mount and run Linux/Ubuntu on your own (Windows) laptop, and grab some RNA seq data files. This should take <45 min working on your own. You must bring this computer to the two workshops.
(ii) Workshop 1 (~2 h). This will introduce you to the command line, some applications (e.g., fastqc used for quality control, bduk for trimming, tophat/bowtie2 for mapping, and htseq for counting mapped reads), and some file formats (fastq, fasta, bam/sam, gtf).
(iii) Workshop 2 (~2 h). This will introduce you to statistical analysis (e.g., using R/Bioconductor, and the DESeq2 package).
We hope this will give you the confidence to try to analyse your own data productively.
Files for download:
VirtualBoxPRC2.pptx (this should be viewed in PowerPoint, and it tells you how to mount Linux/Ubuntu)
fast_shell.sh (you will run this script from the command line to load various additional files you will use)
DESeq2.R (you will use this package to analyze the data)
285_1_short.fastq.gz (a zipped file containing some RNA-seq data)
285_2_short.fastq.gz (a zipped file containing some RNA-seq data)
CheatSheet (a Word file that contains some exercises)