1. Open a Terminal
2. Look at your fastq file
cd /home/user/ngs-tutorial/1-quality_control_fastqc/1-raw_data/ ll head solid.fastq
3. Generate reports for this fastq file
fastqc solid.fastq -o /home/user/ngs-tutorial/1-quality_control_fastqc/2-fastqc_results/
4. Results have been saved in the results folder:
ngs-tutorial > 1-quality_control_fastqc > 2-fastqc_results
These reads seem to need further processing…
5. Trim your sample based on its quality with a minimum quality threshold of 20:
fastq_quality_trimmer -t 20 -i solid.fastq -o solid_t20.fastq
6. Remove the reads with less than a 90% bases with quality above 20:
fastq_quality_filter -q 20 -p 90 -i solid.fastq -o solid_q20p90.fastq
7. Now we could rerun the quality report
Index, create a index folder
cd /home/user/ngs-tutorial/2-mapping/1-input_files/ mkdir index bwa index -p index/hsapiens_chr20 Homo_sapiens.GRCh37.70.dna.chromosome.20.fa
1. Paired-end mapping, 2 steps: first creating sai file, then the sam file
bwa aln -t 2 -l 40 -k 2 -f ../2-results_bwa/test_1_pe.sai index/hsapiens_chr20 test_1.fq bwa aln -t 2 -l 40 -k 2 -f ../2-results_bwa/test_2_pe.sai index/hsapiens_chr20 test_2.fq bwa sampe -n 1 -f ../2-results_bwa/test_pe.sam index/hsapiens_chr20 ../2-results_bwa/test_1_pe.sai ../2-results_bwa/test_2_pe.sai test_1.fq test_2.fq
2. Creating BAM file with samtools and sort it
cd /home/user/ngs-tutorial/2-mapping/2-results_bwa samtools view -S test_pe.sam -b -o test_pe.bam samtools sort test_pe.bam test_pe_sorted
qualimap &
samtools index test_pe_sorted.bam igv &
An easy way to run variant calling
cd /home/user/ngs-tutorial/3-variant_calling/ ./variant_calling
What we are doing is:
1. Remove duplicates
java -jar MarkDuplicates.jar INPUT=aligned_withDup.bam OUTPUT=aligned.bam
2. Count reads
java -jar GenomeAnalysisTK.jar -T CountReads -R f000_reference.fa -I aligned.bam
3. Realignment
java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R f000_reference.fa -I aligned.bam -o f060_intervals2realign_aligned.list java -jar GenomeAnalysisTK.jar -T IndelRealigner -R f000_reference.fa -I f050_aligned.bam -targetIntervals f060_intervals2realign_aligned.list -o f070_realigned_aligned.bam
4. Variant calling
java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R f000_reference.fa -I f070_realigned_aligned.bam -glm SNP -o f080_snp_variants.vcf java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R f000_reference.fa -I f070_realigned_aligned.bam -glm INDEL -o f080_indel_variants.vcf
5. Recalibration
java -jar GenomeAnalysisTK.jar -T VariantFiltration -R f000_reference.fa -V f080_snp_variants.vcf --filterExpression "QD < 12.0" --filterName "LowConf" -o f090_snp_filtered.vcf