克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

KarmaDock: a deep learning paradigm for ultra-large library docking with fast speed and high accuracy

Contents

Overview

Ligand docking is one of the core technologies in structure-based virtual screening for drug discovery. However, conventional docking tools and existing deep learning tools may suffer from limited performance in terms of speed, pose quality and binding affinity accuracy. Here we propose KarmaDock, a deep learning approach for ligand docking that integrates the functions of docking acceleration, binding pose generation and correction, and binding strength estimation. The three-stage model consists of the following components: (1) encoders for the protein and ligand to learn the representations of intramolecular interactions; (2) E(n) equivariant graph neural networks with self-attention to update the ligand pose based on both protein–ligand and intramolecular interactions, followed by post-processing to ensure chemically plausible structures; (3) a mixture density network for scoring the binding strength. KarmaDock was validated on four benchmark datasets and tested in a real-world virtual screening project that successfully identified experiment-validated active inhibitors of leukocyte tyrosine kinase (LTK).

If you find it useful, please cite:

Efficient and accurate large library ligand docking with KarmaDock Zhang, Xujun#; Zhang, Odin#; Shen, Chao; Qu, Wanglin; Chen, Shicheng; Cao, Hanqun; Kang, Yu; Wang, Zhe; Wang, Ercheng; Zhang, Jintu; Deng, Yafeng; Liu, Furui; Wang, Tianyue; Du, Hongyan; Wang, Langcheng; Pan, Peichen*; Chen, Guangyong*; Hsieh, Chang-Yu*; Hou, Tingjun*. Published in: Nature Computational Science, 2023, Vol. 3, No. 9, pp. 789-804. DOI: 10.1038/s43588-023-00511-5

Software Requirements

OS Requirements

The package development version is tested on Linux: Ubuntu 18.04 operating systems.

Python Dependencies

Dependencies for KarmaDock:

pytorch
pyg
rdkit=2022.09.1 (important!!!)
mdanalysis
prody 

Installation Guide

download this repo

git clone https://github.com/schrojunzhang/KarmaDock.git

install karmadock_env

you can install the env via yaml file

cd KarmaDock
conda env create -f karmadock_env.yaml

or you can download the conda-packed file, and then unzip it in ${anaconda install dir}/anaconda3/envs. ${anaconda install dir} represents the dir where the anaconda is installed. For me, ${anaconda install dir}=/root .

mkdir ${anaconda install dir}/anaconda3/envs/karmadock 
tar -xzvf karmadock.tar.gz -C ${anaconda install dir}/anaconda3/envs/karmadock
conda activate karmadock

Demo1 & ligand docking on PDBBind core set

Assume that the project is at /root and therefore the project path is /root/KarmaDock.

1. Download PDBBind dataset

You can download the PDBBind 2020 core set without preprocessing from the PDBBind website OR you can download the version where protein files were prepared by Schrodinger.

cd /root/KarmaDock
wget https://zenodo.org/record/7788083/files/pdbbind2020_core_set.zip?download=1
unzip -q pdbbind2020_core_set.zip?download=1

2. Preprocess PDBBind data

The purpose of this step is to identify residues that are within a 12Å radius of any ligand atom and use them as the pocket of the protein. The pocket file (xxx_pocket_ligH12A.pdb) will also be saved on the complex_file_dir.

cd /root/KarmaDock/utils 
python -u pre_processing.py --complex_file_dir ~/your/PDBBindDataset/path

e.g.,

cd /root/KarmaDock/utils 
python -u pre_processing.py --complex_file_dir /root/KarmaDock/pdbbind2020_core_set

3. Generate graphs based on protein-ligand complexes

This step will generate graphs for protein-ligand complexes and save them (*.dgl) to graph_file_dir.

cd /root/KarmaDock/utils 
python -u generate_graph.py 
--complex_file_dir ~/your/PDBBindDataset/path 
--graph_file_dir ~/the/directory/for/saving/graph 

e.g.,

cd /root/KarmaDock/utils 
python -u generate_graph.py --complex_file_dir /root/KarmaDock/pdbbind2020_core_set --graph_file_dir /root/KarmaDock/pdbbind_graph 

4. ligand docking

This step will perform ligand docking (predict binding poses and binding strengthes) based on the graphs. (finished in about 0.5 min)

cd /root/KarmaDock/utils 
python -u ligand_docking.py 
--graph_file_dir ~/the/directory/for/saving/graph 
--model_file ~/path/of/trained/model/parameters 
--out_dir ~/path/for/recording/BindingPoses&DockingScores 
--docking Ture/False  whether generating binding poses
--scoring Ture/False  whether predict binding affinities
--correct Ture/False  whether correct the predicted binding poses
--batch_size 64 
--random_seed 2023 

e.g.,

cd /root/KarmaDock/utils 
python -u ligand_docking.py --graph_file_dir /root/KarmaDock/pdbbind_graph --model_file /root/KarmaDock/trained_models/karmadock_screening.pkl --out_dir /root/KarmaDock/pdbbind_result --docking True --scoring True --correct True --batch_size 64 --random_seed 2023

Demo2 & virtual screening on DEKOIS 2.0

Assume that the project is at /root and therefore the project path is /root/KarmaDock.

1. Download DEKOIS dataset

You can download the DEKOIS 2.0 dataset without preprocessing from the DEKOIS website OR you can download the version where protein files were prepared by Schrodinger, glide-docked poses were provided.

cd /root/KarmaDock
wget https://zenodo.org/record/8131256/files/DEKOIS2.zip?download=1
unzip -q DEKOIS2.zip?download=1

2. virtual screening

This step will perform virtual screening for a specific target PDK1 (predict binding poses and binding strengthes).

(1) CPU and GPU machines (faster):

You can run the following command on CPUs before performing virtual screening (generate graphs in advance)

cd /root/KarmaDock/utils
python -u virtual_screening_pipeline.py 
--mode generate_graph
--ligand_smi ~/the/directory/for/ligand/library/smi 
--protein_file ~/the/directory/for/target/protein/pdb 
--crystal_ligand_file ~/the/directory/for/crystal/ligand/mol2/for/binding/pocket 
--graph_dir ~/the/directory/for/saving/ligand/graphs 
--random_seed 2023 

e.g.,

cd /root/KarmaDock/utils 
python -u virtual_screening_pipeline.py --ligand_smi /root/KarmaDock/DEKOIS2/pdk1/active_decoys.smi --protein_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_protein.pdb --crystal_ligand_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_ligand.mol2 --graph_dir /root/KarmaDock/DEKOIS2/pdk1/karmadock_liggraph --random_seed 2023 

Then, you can run the following command on GPUs to perform virtual screening (predict binding poses and binding strengthes)

cd /root/KarmaDock/utils
python -u virtual_screening_pipeline.py 
--mode vs
--protein_file ~/the/directory/for/target/protein/pdb 
--crystal_ligand_file ~/the/directory/for/crystal/ligand/mol2/for/binding/pocket 
--graph_dir ~/the/directory/for/saving/ligand/graphs 
--out_dir ~/path/for/recording/BindingPoses&DockingScores 
--score_threshold 50 
--batch_size 64 
--random_seed 2023 
--out_uncoorected 
--out_corrected

e.g.,

cd /root/KarmaDock/utils 
python -u virtual_screening_pipeline.py --protein_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_protein.pdb --crystal_ligand_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_ligand.mol2 --graph_dir /root/KarmaDock/DEKOIS2/pdk1/karmadock_liggraph --out_dir /root/KarmaDock/DEKOIS2/pdk1/karmadocked --score_threshold 50 --batch_size 64 --random_seed 2023 --out_uncoorected --out_corrected
(2) GPU machines (slower but more convinent):

For pure GPU machines, you can run the following command to perform virtual screening (generate graphs on the fly)

cd /root/KarmaDock/utils 
python -u virtual_screening.py 
--ligand_smi ~/the/directory/for/ligand/library/smi 
--protein_file ~/the/directory/for/target/protein/pdb 
--crystal_ligand_file ~/the/directory/for/crystal/ligand/mol2/for/binding/pocket 
--out_dir ~/path/for/recording/BindingPoses&DockingScores 
--score_threshold 50
--batch_size 64 
--random_seed 2023 
--out_uncoorected
--out_corrected

e.g.,

cd /root/KarmaDock/utils 
python -u virtual_screening.py --ligand_smi /root/KarmaDock/DEKOIS2/pdk1/active_decoys.smi --protein_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_protein.pdb --crystal_ligand_file /root/KarmaDock/DEKOIS2/pdk1/protein/pdk1_ligand.mol2 --out_dir /root/KarmaDock/DEKOIS2/pdk1/karmadocked --score_threshold 50 --batch_size 64 --random_seed 2023 --out_uncoorected --out_corrected
MIT License Copyright (c) 2023 XujunZhang Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

分子对接,深度学习 展开 收起
Python
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化