NGS workshop: demystifying data analysis

Many of us are sending off samples for ‘next-generation DNA sequencing’ (NGS), and getting the data back. This workshop is for biologists who know they should master how to analyse the resulting data – and who are complete beginners (probably with strong aversions to using a command-line and statistics – the two basic requirements). We intend to demystify the process in three steps:
(i) You will be given instructions that enable you to mount and run Linux/Ubuntu on your own (Windows) laptop, and grab some RNA seq data files. This should take <45 min working on your own. You must bring this computer to the two workshops.
(ii) Workshop 1 (~2 h). This will introduce you to the command line, some applications (e.g., fastqc used for quality control, bduk for trimming, tophat/bowtie2 for mapping, and htseq for counting mapped reads), and some file formats (fastq, fasta, bam/sam, gtf).
(iii) Workshop 2 (~2 h). This will introduce you to statistical analysis (e.g., using R/Bioconductor, and the DESeq2 package).
We hope this will give you the confidence to try to analyse your own data productively.

Files for download:
VirtualBoxPRC2.pptx (this should be viewed in PowerPoint, and it tells you how to mount Linux/Ubuntu)
fast_shell.sh (you will run this script from the command line to load various additional files you will use)
DESeq2.R (you will use this package to analyze the data)
285_1_short.fastq.gz (a zipped file containing some RNA-seq data)
285_2_short.fastq.gz (a zipped file containing some RNA-seq data)
CheatSheet (a Word file that contains some exercises)

	Nuclear Structure and Function Research Group
	Peter R Cook's group at The Sir William Dunn School of Pathology University of Oxford
	Home Our science Publications Images µFluidics
Quick links Contacts, visiting Oxford Model for all genomes Resources Movies: Transcription Microfluidics	NGS Workshop: demystifying data analysis
	NGS workshop: demystifying data analysis Many of us are sending off samples for ‘next-generation DNA sequencing’ (NGS), and getting the data back. This workshop is for biologists who know they should master how to analyse the resulting data – and who are complete beginners (probably with strong aversions to using a command-line and statistics – the two basic requirements). We intend to demystify the process in three steps: (i) You will be given instructions that enable you to mount and run Linux/Ubuntu on your own (Windows) laptop, and grab some RNA seq data files. This should take <45 min working on your own. You must bring this computer to the two workshops. (ii) Workshop 1 (~2 h). This will introduce you to the command line, some applications (e.g., fastqc used for quality control, bduk for trimming, tophat/bowtie2 for mapping, and htseq for counting mapped reads), and some file formats (fastq, fasta, bam/sam, gtf). (iii) Workshop 2 (~2 h). This will introduce you to statistical analysis (e.g., using R/Bioconductor, and the DESeq2 package). We hope this will give you the confidence to try to analyse your own data productively. Files for download: VirtualBoxPRC2.pptx (this should be viewed in PowerPoint, and it tells you how to mount Linux/Ubuntu) fast_shell.sh (you will run this script from the command line to load various additional files you will use) DESeq2.R (you will use this package to analyze the data) 285_1_short.fastq.gz (a zipped file containing some RNA-seq data) 285_2_short.fastq.gz (a zipped file containing some RNA-seq data) CheatSheet (a Word file that contains some exercises) Top \| Home \| Maintained by Peter Cook \|

Nuclear Structure and Function Research Group

Quick links