2.1 Primary Analysis - Sequence processing

After sequencing, reads are mapped to the human genome reference (GRCh37). Then, there are some filtering steps in which the number of reads decreases until variant calling. Here, we describe how this reduction occurs in each pipeline.

BIER's pipeline

In this pipeline, there are four stages where the number of reads decreases (mapping, filter by mapping quality, remove duplicates and intervals realignment). This table shows reads remaining after these stages.

  • N_reads_forward and reverse: initial number of reads forward and reverse obtained in the exome sequencing process
  • N_mapped_read_pairs: number of read pairs mapped to the human genome reference
  • %_mapped_read_pairs: percentage of initial read pairs mapped to the human genome reference
  • N_mapped_reads_mapq>10: number of mapped reads whose mapping quality (mapq) is higher than 10
  • %_mapped_reads_mapq>10: percentage of initial mapped reads whose mapping quality (mapq) is higher than 10
  • N_reads_single_hit: number of reads uniquely mapped to the human genome reference without duplicates
  • %_reads_single_hit: percentage of initial reads uniquely mapped to the human genome reference without duplicates
  • N_reads_single_hit_realigned: number of reads located in the exome capture kit targets who had been realigned
  • %_reads_single_hit_realigned: percentage of initial reads located in the exome capture kit targets who had been realigned
Sample N_reads forward N_reads reverse N_mapped read_pairs %_mapped read_pairs N_mapped reads mapq>10 %_mapped reads mapq>10 N_reads single_hit %_reads single_hit N_reads single_hit realigned %_reads single_hit realigned
SGT038 31471997 31471997 31304563 99.47 32844926 52.18 28115216 44.67 22724178 36.10
SGT077 27308034 27308034 27166800 99.48 28443473 52.08 24680760 45.19 20024427 36.66
SGT161 27566691 27566691 27429566 99.50 28775620 52.19 24997317 45.34 20213242 36.66
SGT187 29730857 29730857 29593092 99.54 31001122 52.14 26579044 44.70 21530503 36.21
SGT230 30415770 30415770 30296046 99.61 31712462 52.13 27084895 44.52 22377265 36.79
SGT238 29472514 29472514 29333890 99.53 30770469 52.20 26406037 44.80 21543680 36.55
SGT241 29513223 29513223 29365739 99.50 30803708 52.19 26174909 44.34 21579194 36.56
SGT274 30394832 30394832 30268625 99.58 31699515 52.15 27122226 44.62 22387721 36.83

CNAG's pipeline

In contrast with BIER's pipeline, there are only two stages where the number of reads decreases (mapping and remove duplicates).

  • N_reads_forward and reverse: initial number of reads forward and reverse obtained in the exome sequencing process
  • N_mapped_read_pairs: number of read pairs mapped to the human genome reference
  • %_mapped_read_pairs: percentage of initial read pairs mapped to the human genome reference
  • N_read_pairs_single_hit: number of read pairs uniquely mapped to the human genome reference remaining after removing duplicates
  • %_read_pairs_single_hit: percentage of initial read pairs uniquely mapped to the human genome reference remaining after removing duplicates
Sample N_reads_forward N_reads_reverse N_mapped read_pairs %_mapped read_pairs N_read_pairs single_hit %_reads_pairs single_hit
SGT038 31471997 31471997 27098380 86.10 26533415 84.31
SGT077 27308034 27308034 23450991 85.88 22904031 83.87
SGT161 27566691 27566691 23668265 85.86 23170780 84.05
SGT187 29730857 29730857 25554609 85.95 24894597 83.73
SGT230 30415770 30415770 26257368 86.33 25639411 84.30
SGT238 29472514 29472514 25386147 86.13 24773292 84.06
SGT241 29513223 29513223 25365246 85.95 24539542 83.15
SGT274 30394832 30394832 26242677 86.34 25693406 84.53
espinos/results.primary.analysis.txt · Last modified: 2017/05/24 13:50 (external edit)
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0