Table of Contents

Read preprocessing

1. Open a Terminal

2. Look at your fastq file

cd /home/user/ngs-tutorial/1-quality_control_fastqc/1-raw_data/
ll
head solid.fastq

3. Generate reports for this fastq file

fastqc solid.fastq -o /home/user/ngs-tutorial/1-quality_control_fastqc/2-fastqc_results/

4. Results have been saved in the results folder:

 ngs-tutorial > 1-quality_control_fastqc > 2-fastqc_results

These reads seem to need further processing…

5. Trim your sample based on its quality with a minimum quality threshold of 20:

fastq_quality_trimmer -t 20 -i solid.fastq -o solid_t20.fastq

6. Remove the reads with less than a 90% bases with quality above 20:

fastq_quality_filter -q 20 -p 90 -i solid.fastq -o solid_q20p90.fastq

7. Now we could rerun the quality report

Mapping

Create index

Index, create a index folder

cd /home/user/ngs-tutorial/2-mapping/1-input_files/
mkdir index
bwa index -p index/hsapiens_chr20 Homo_sapiens.GRCh37.70.dna.chromosome.20.fa

Align

1. Paired-end mapping, 2 steps: first creating sai file, then the sam file

bwa aln -t 2 -l 40 -k 2 -f ../2-results_bwa/test_1_pe.sai index/hsapiens_chr20 test_1.fq
bwa aln -t 2 -l 40 -k 2 -f ../2-results_bwa/test_2_pe.sai index/hsapiens_chr20 test_2.fq

bwa sampe -n 1 -f ../2-results_bwa/test_pe.sam index/hsapiens_chr20 ../2-results_bwa/test_1_pe.sai ../2-results_bwa/test_2_pe.sai test_1.fq test_2.fq

2. Creating BAM file with samtools and sort it

cd /home/user/ngs-tutorial/2-mapping/2-results_bwa
samtools view -S test_pe.sam -b -o test_pe.bam
samtools sort test_pe.bam test_pe_sorted

Quality control of the alignment

qualimap &

IGV visualitzation

samtools index test_pe_sorted.bam
igv &

Variant calling

An easy way to run variant calling

cd /home/user/ngs-tutorial/3-variant_calling/
./variant_calling

What we are doing is:

1. Remove duplicates

java -jar MarkDuplicates.jar INPUT=aligned_withDup.bam OUTPUT=aligned.bam 

2. Count reads

java -jar GenomeAnalysisTK.jar -T CountReads -R f000_reference.fa -I aligned.bam

3. Realignment

java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R f000_reference.fa -I aligned.bam -o f060_intervals2realign_aligned.list

java -jar GenomeAnalysisTK.jar -T IndelRealigner -R f000_reference.fa -I f050_aligned.bam -targetIntervals f060_intervals2realign_aligned.list -o f070_realigned_aligned.bam

4. Variant calling

java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R f000_reference.fa -I f070_realigned_aligned.bam -glm SNP -o f080_snp_variants.vcf

java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R f000_reference.fa -I f070_realigned_aligned.bam -glm INDEL -o f080_indel_variants.vcf

5. Recalibration

java -jar GenomeAnalysisTK.jar -T VariantFiltration -R f000_reference.fa -V f080_snp_variants.vcf --filterExpression "QD < 12.0" --filterName "LowConf" -o f090_snp_filtered.vcf