===== Read preprocessing ===== 1. Open a Terminal 2. Look at your fastq file cd /home/user/ngs-tutorial/1-quality_control_fastqc/1-raw_data/ ll head solid.fastq 3. Generate reports for this fastq file fastqc solid.fastq -o /home/user/ngs-tutorial/1-quality_control_fastqc/2-fastqc_results/ 4. Results have been saved in the results folder: ngs-tutorial > 1-quality_control_fastqc > 2-fastqc_results These reads seem to need further processing... 5. Trim your sample based on its quality with a minimum quality threshold of 20: fastq_quality_trimmer -t 20 -i solid.fastq -o solid_t20.fastq 6. Remove the reads with less than a 90% bases with quality above 20: fastq_quality_filter -q 20 -p 90 -i solid.fastq -o solid_q20p90.fastq 7. Now we could rerun the quality report ===== Mapping ===== ==== Create index ==== Index, create a index folder cd /home/user/ngs-tutorial/2-mapping/1-input_files/ mkdir index bwa index -p index/hsapiens_chr20 Homo_sapiens.GRCh37.70.dna.chromosome.20.fa ==== Align ==== 1. Paired-end mapping, 2 steps: first creating sai file, then the sam file bwa aln -t 2 -l 40 -k 2 -f ../2-results_bwa/test_1_pe.sai index/hsapiens_chr20 test_1.fq bwa aln -t 2 -l 40 -k 2 -f ../2-results_bwa/test_2_pe.sai index/hsapiens_chr20 test_2.fq bwa sampe -n 1 -f ../2-results_bwa/test_pe.sam index/hsapiens_chr20 ../2-results_bwa/test_1_pe.sai ../2-results_bwa/test_2_pe.sai test_1.fq test_2.fq 2. Creating BAM file with samtools and sort it cd /home/user/ngs-tutorial/2-mapping/2-results_bwa samtools view -S test_pe.sam -b -o test_pe.bam samtools sort test_pe.bam test_pe_sorted ==== Quality control of the alignment ==== qualimap & ==== IGV visualitzation ==== samtools index test_pe_sorted.bam igv & ===== Variant calling ===== An easy way to run variant calling cd /home/user/ngs-tutorial/3-variant_calling/ ./variant_calling What we are doing is: 1. Remove duplicates java -jar MarkDuplicates.jar INPUT=aligned_withDup.bam OUTPUT=aligned.bam 2. Count reads java -jar GenomeAnalysisTK.jar -T CountReads -R f000_reference.fa -I aligned.bam 3. Realignment java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R f000_reference.fa -I aligned.bam -o f060_intervals2realign_aligned.list java -jar GenomeAnalysisTK.jar -T IndelRealigner -R f000_reference.fa -I f050_aligned.bam -targetIntervals f060_intervals2realign_aligned.list -o f070_realigned_aligned.bam 4. Variant calling java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R f000_reference.fa -I f070_realigned_aligned.bam -glm SNP -o f080_snp_variants.vcf java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R f000_reference.fa -I f070_realigned_aligned.bam -glm INDEL -o f080_indel_variants.vcf 5. Recalibration java -jar GenomeAnalysisTK.jar -T VariantFiltration -R f000_reference.fa -V f080_snp_variants.vcf --filterExpression "QD < 12.0" --filterName "LowConf" -o f090_snp_filtered.vcf