====== 2.1 Primary Analysis - Sequence processing ====== After sequencing, reads are mapped to the human genome reference (GRCh37). Then, there are some filtering steps in which the number of reads decreases until variant calling. Here, we describe how this reduction occurs in each pipeline. ==== BIER's pipeline ==== In this pipeline, there are four stages where the number of reads decreases (mapping, filter by mapping quality, remove duplicates and intervals realignment). This table shows reads remaining after these stages. * **N_reads_forward and reverse**: initial number of reads forward and reverse obtained in the exome sequencing process * **N_mapped_read_pairs**: number of read pairs mapped to the human genome reference * **%_mapped_read_pairs**: percentage of initial read pairs mapped to the human genome reference * **N_mapped_reads_mapq>10**: number of mapped reads whose mapping quality (mapq) is higher than 10 * **%_mapped_reads_mapq>10**: percentage of initial mapped reads whose mapping quality (mapq) is higher than 10 * **N_reads_single_hit**: number of reads uniquely mapped to the human genome reference without duplicates * **%_reads_single_hit**: percentage of initial reads uniquely mapped to the human genome reference without duplicates * **N_reads_single_hit_realigned**: number of reads located in the exome capture kit targets who had been realigned * **%_reads_single_hit_realigned**: percentage of initial reads located in the exome capture kit targets who had been realigned ^ Sample ^ N_reads forward ^ N_reads reverse ^ N_mapped read_pairs ^ %_mapped read_pairs ^ N_mapped reads mapq>10 ^ %_mapped reads mapq>10 ^ N_reads single_hit ^ %_reads single_hit ^ N_reads single_hit realigned ^ %_reads single_hit realigned ^ | SGT038 | 31471997 | 31471997 | 31304563 | 99.47 | 32844926 | 52.18 | 28115216 | 44.67 | 22724178 | 36.10 | | SGT077 | 27308034 | 27308034 | 27166800 | 99.48 | 28443473 | 52.08 | 24680760 | 45.19 | 20024427 | 36.66 | | SGT161 | 27566691 | 27566691 | 27429566 | 99.50 | 28775620 | 52.19 | 24997317 | 45.34 | 20213242 | 36.66 | | SGT187 | 29730857 | 29730857 | 29593092 | 99.54 | 31001122 | 52.14 | 26579044 | 44.70 | 21530503 | 36.21 | | SGT230 | 30415770 | 30415770 | 30296046 | 99.61 | 31712462 | 52.13 | 27084895 | 44.52 | 22377265 | 36.79 | | SGT238 | 29472514 | 29472514 | 29333890 | 99.53 | 30770469 | 52.20 | 26406037 | 44.80 | 21543680 | 36.55 | | SGT241 | 29513223 | 29513223 | 29365739 | 99.50 | 30803708 | 52.19 | 26174909 | 44.34 | 21579194 | 36.56 | | SGT274 | 30394832 | 30394832 | 30268625 | 99.58 | 31699515 | 52.15 | 27122226 | 44.62 | 22387721 | 36.83 | ==== CNAG's pipeline ==== In contrast with BIER's pipeline, there are only two stages where the number of reads decreases (mapping and remove duplicates). * **N_reads_forward and reverse**: initial number of reads forward and reverse obtained in the exome sequencing process * **N_mapped_read_pairs**: number of read pairs mapped to the human genome reference * **%_mapped_read_pairs**: percentage of initial read pairs mapped to the human genome reference * **N_read_pairs_single_hit**: number of read pairs uniquely mapped to the human genome reference remaining after removing duplicates * **%_read_pairs_single_hit**: percentage of initial read pairs uniquely mapped to the human genome reference remaining after removing duplicates ^ Sample ^ N_reads_forward ^ N_reads_reverse ^ N_mapped read_pairs ^ %_mapped read_pairs ^ N_read_pairs single_hit ^ %_reads_pairs single_hit ^ | SGT038 | 31471997 | 31471997 | 27098380 | 86.10 | 26533415 | 84.31 | | SGT077 | 27308034 | 27308034 | 23450991 | 85.88 | 22904031 | 83.87 | | SGT161 | 27566691 | 27566691 | 23668265 | 85.86 | 23170780 | 84.05 | | SGT187 | 29730857 | 29730857 | 25554609 | 85.95 | 24894597 | 83.73 | | SGT230 | 30415770 | 30415770 | 26257368 | 86.33 | 25639411 | 84.30 | | SGT238 | 29472514 | 29472514 | 25386147 | 86.13 | 24773292 | 84.06 | | SGT241 | 29513223 | 29513223 | 25365246 | 85.95 | 24539542 | 83.15 | | SGT274 | 30394832 | 30394832 | 26242677 | 86.34 | 25693406 | 84.53 |