克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

TCMSP-Spider

TCMSP-Spider is a Python tool for extracting data from TCMSP (Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform) website. It allows you to search for a specific drug and retrieve its related ingredients, targets, and diseases. Additionally, you can download "all" data of drugs, ingredients, targets, and diseases. The tool can be easily configured to query and download a list of drugs, eliminating the need to manually pass token parameters.

Installation

  1. Clone the repository and navigate to the project directory:
git clone https://github.com/shujuecn/TCMSP-Spider.git
cd TCMSP-Spider
  1. Install the required dependencies:
pip3 install -r requirements.txt

Usage

Searching data by drug name

  1. Add the names of the drugs you want to search for in herb_list.txt. You can add multiple drugs, and the names can be written in Chinese, Pinyin or Latin, for example:
麻黄
Baizhu
Citrus Reticulata
  1. Run the following command to start the search process:
python3 src/search_save_herbs.py

The program will automatically obtain the token value and query all the drugs specified in herb_list.txt. Because a single Chinese or Pinyin name may correspond to multiple drugs, the program will download the ingredients, targets, and diseases of each drug, and save them in an Excel (.xlsx) file in the data/spider_data folder.

麻黄 -> 麻黄、麻黄根
fuzi -> Baifuzi、Difuzi、Fuzi、Laifuzi

Downloading "all" data

On the TCMSP Browse Database page, the website provides four types of data, including "all" drugs, ingredients, targets, and diseases. You can use the following command to download these data and save them in an Excel (.xlsx) file in the data/sample_data folder.

python3 src/get_all_data.py

Querying relationships

Using the data downloaded with "Get all data," you can use the program to query the relationships between drugs, ingredients, targets, and diseases. For example:

Target ID: TAR00006

Related diseases: Chronic inflammatory diseases...
Related ingredients: cyanidol...
Related herbs: Asteris Radix Et Rhizoma...

While it is not currently available in the current version of the program, in the future, it may be possible to use the data downloaded using "Get all data" to query for relationships between different elements, such as finding all the ingredients related to a certain disease or target. This feature is not yet implemented in the current version, but may be added in a future update.

LICENSE

This project is released under the MIT open source license. If you have any suggestions or feedback, please feel free to submit an issue or pull request.

Changelog

  • 2023/02/09: Initial commit. Completed the search function and data download function.
  • 2023/02/10: Refactored the project structure and added the "download all data" function.
MIT License Copyright (c) 2023 shujuecn Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

爬取TCMSP数据库:药物名称、药物成分、药物靶点和疾病等信息 展开 收起
Python
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化