最近接触的数据都是靶向测序,或者全外测序的数据。对数据的覆盖深度及靶向捕获效率的评估成为了数据质量监控中必不可少的一环。
以前都是用samtools depth 算出单碱基的深度后,用perl来进行深度及捕获效率的计算。今天无意中看到了bamdst(https://github.com/shiquan/bamdst)这个软件,用起来也很方便,参考GitHub,在此记录使用方法。
下载并安装:下载安装包并解压后,
cd ./bamdst-mastermake
安装好后,需要准备.bed文件及.bam文件,以软件提供的MT-RNR1.bed和test.bam为例:
./bamdst-master/bamdst -p ./bamdst-master/example/MT-RNR1.bed -o ./test ./bamdst-master/example/test.bam
输出的结果包含7个文件,为:
-coverage.report-cumu.plot-insert.plot-chromosome.report-region.tsv.gz-depth.tsv.gz-uncover.bed
其中coverage.report提供的信息很多,具体可参照如下:
[Total] Raw Reads (All reads) // All reads in the bam file(s). [Total] QC Fail reads // Reads number failed QC, this flag is marked by other software,like bwa. See flag in the bam structure. [Total] Raw Data(Mb) // Total reads data in the bam file(s).[Total] Paired Reads // Paired reads numbers.[Total] Mapped Reads // Mapped reads numbers.[Total] Fraction of Mapped Reads // Ratio of mapped reads against raw reads.[Total] Mapped Data(Mb) // Mapped data in the bam file(s).[Total] Fraction of Mapped Data(Mb) // Ratio of mapped data against raw data.[Total] Properly paired // Paired reads with properly insert size. See bam format protocol for details.[Total] Fraction of Properly paired // Ratio of properly paired reads against mapped reads[Total] Read and mate paired // Read (read1) and mate read (read2) paired.[Total] Fraction of Read and mate paired // Ratio of read and mate paired against mapped reads[Total] Singletons // Read mapped but mate read unmapped, and vice versa.[Total] Read and mate map to diff chr // Read and mate read mapped to different chromosome, usually because mapping error and structure variants.[Total] Read1 // First reads in mate paired sequencing[Total] Read2 // Mate reads[Total] Read1(rmdup) // First reads after remove duplications.[Total] Read2(rmdup) // Mate reads after remove duplications.[Total] forward strand reads // Number of forward strand reads.[Total] backward strand reads // Number of backward strand reads.[Total] PCR duplicate reads // PCR duplications.[Total] Fraction of PCR duplicate reads // Ratio of PCR duplications.[Total] Map quality cutoff value // Cutoff map quality score, this value can be set by -q. default is 20, because some variants caller like GATK only consider high quality reads.[Total] MapQuality above cutoff reads // Number of reads with higher or equal quality score than cutoff value.[Total] Fraction of MapQ reads in all reads // Ratio of reads with higher or equal Q score against raw reads.[Total] Fraction of MapQ reads in mapped reads // Ratio of reads with higher or equal Q score against mapped reads.[Target] Target Reads // Number of reads covered target region (specified by bed file).[Target] Fraction of Target Reads in all reads // Ratio of target reads against raw reads.[Target] Fraction of Target Reads in mapped reads // Ratio of target reads against mapped reads.[Target] Target Data(Mb) // Total bases covered target region. If a read covered target region partly, only the covered bases will be counted.[Target] Target Data Rmdup(Mb) // Total bases covered target region after remove PCR duplications. [Target] Fraction of Target Data in all data // Ratio of target bases against raw bases.[Target] Fraction of Target Data in mapped data // Ratio of target bases against mapped bases.[Target] Len of region // The length of target regions.[Target] Average depth // Average depth of target regions. Calculated by "target bases / length of regions".[Target] Average depth(rmdup) // Average depth of target regions after remove PCR duplications.[Target] Coverage (>0x) // Ratio of bases with depth greater than 0x in target regions, which also means the ratio of covered regions in target regions.[Target] Coverage (>=4x) // Ratio of bases with depth greater than or equal to 4x in target regions.[Target] Coverage (>=10x) // Ratio of bases with depth greater than or equal to 10x in target regions.[Target] Coverage (>=30x) // Ratio of bases with depth greater than or equal to 30x in target regions.[Target] Coverage (>=100x) // Ratio of bases with depth greater than or equal to 100x in target regions.[Target] Coverage (>=Nx) // This is addtional line for user self-defined cutoff value, see --cutoffdepth[Target] Target Region Count // Number of target regions. In normal practise,it is the total number of exomes.[Target] Region covered > 0x // The number of these regions with average depth greater than 0x.[Target] Fraction Region covered > 0x // Ratio of these regions with average depth greater than 0x.[Target] Fraction Region covered >= 4x // Ratio of these regions with average depth greater than or equal to 4x.[Target] Fraction Region covered >= 10x // Ratio of these regions with average depth greater than or equal to 10x.[Target] Fraction Region covered >= 30x // Ratio of these regions with average depth greater than or equal to 30x.[Target] Fraction Region covered >= 100x // Ratio of these regions with average depth greater than or equal to 100x.[flank] flank size // The flank size will be count. 200 bp in default. Oligos could also capture the nearby regions of target regions.[flank] Len of region (not include target region) // The length of flank regions (target regions will not be count).[flank] Average depth // Average depth of flank regions.[flank] flank Reads // The total number of reads covered the flank regions. Note: some reads covered the edge of target regions, will be count in flank regions also. [flank] Fraction of flank Reads in all reads // Ratio of reads covered in flank regions against raw reads.[flank] Fraction of flank Reads in mapped reads // Ration of reads covered in flank regions against mapped reads.[flank] flank Data(Mb) // Total bases in the flank regions.[flank] Fraction of flank Data in all data // Ratio of total bases in the flank regions against raw data.[flank] Fraction of flank Data in mapped data // Ratio of total bases in the flank regions against mapped data.[flank] Coverage (>0x) // Ratio of flank bases with depth greater than 0x.[flank] Coverage (>=4x) // Ratio of flank bases with depth greater than or equal to 4x.[flank] Coverage (>=10x) // Ratio of flank bases with depth greater than or equal to 10x.[flank] Coverage (>=30x) // Ratio of flank bases with depth greater than or equal to 30x.