drep:基因组的快速比较和去复制


MrOlm/drep: Rapid comparison and dereplication of genomes

推荐用conda安装

conda create -n drep
conda activate drep
conda install drep -c bioconda

也可以用pip,但是有一些依赖的包可能需要自己下

pip install drep

image-20220506151828537

image-20220506151949260

安装版本V3.2.2

drep:微生物基因组快速去冗余-文章解读+帮助文档+实战教程

image-20220506162420047

1. dRep需要依赖一些软件

运行
$ dRep check_dependencies
mash.................................... !!! ERROR !!!   (location = None)
nucmer.................................. !!! ERROR !!!   (location = None)
checkm.................................. all good        (location = 
ANIcalculator........................... !!! ERROR !!!   (location = None)
prodigal................................ all good        (location = /usr/bin/prodigal)
centrifuge.............................. !!! ERROR !!!   (location = None)
nsimscan................................ !!! ERROR !!!   (location = None)
fastANI................................. !!! ERROR !!!   (location = None)

image-20220509154537123

这两个是必须的

可以单独安装,也可以让conda安装

这两个应该都行
conda install -c bioconda mash
conda install -c bioconda/label/cf201901 mash
conda install -c bioconda mummer
conda install -c bioconda/label/cf201901 mummer

mash的安装

Mash: fast genome and metagenome distance estimation using MinHash | Genome Biology | Full Text
marbl/Mash: Fast genome and metagenome distance estimation using MinHash
Release Mash v2.3 · marbl/Mash

image-20220507233706093

下载之后,安装就ok了

nucmer的安装

The MUMmer Home Page

mummer4/mummer: Mummer alignment tool

mummer/INSTALL.md at master · mummer4/mummer

image-20220507234520479

然后

image-20220509154629191

这些是可选的

我下centrifuge的时候,发现我的版本可能高了,不适配了

不用都下,用不到就先不下,报错了再下也不迟

2. 实战

Try1

##模拟数据来源刘永鑫
(drep) chenl 16:32:14 ~/drep_try/fa
$ ls
B4018L.2.fa  K4093L.5.fa  K4096L.2.fa  L4105L.2.fa  W4194L.3.fa  W4194L.6.fa
$ dRep dereplicate out1 -g ./fa/*.fa

image-20220509172353844

checkm的时间比较久,然后啪叽就成功了

Succeed:happy:

Try2

dRep dereplicate ./ -g bin/*.fa -sa 0.95 -nc 0.30 -p 24 -comp 50 -con 10
 -sa S_ANI, --S_ani S_ANI
                       二级聚类为99% ANI threshold to form secondary clusters (default:
                       0.99)
 -nc COV_THRESH, --cov_thresh COV_THRESH
                       最小的重叠是10% Minmum level of overlap between genomes when doing
                       secondary comparisons (default: 0.1)
- p 线程
- comp 完整度
- 污染度

image-20220509231153634

Try3

image-20220509231307632

3. 结果

Cluster_scoring

image-20220510102806936

Clustering_scatterplots

image-20220510102904210

Primary_clustering_dendrogram

image-20220510102940514

Secondary_clustering_dendrograms

image-20220510103012175

Secondary_clustering_MDS

image-20220510103100252

Winning_genomes

image-20220510103131052


文章作者: Cling
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Cling !
  目录