加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
embedding_doc.py 698 Bytes
一键复制 编辑 原始数据 按行查看 历史
ArkShin 提交于 2023-07-16 12:41 . Upload New File
from sentence_transformers import SentenceTransformer as ST
import os
import json
with open(".\config.json", "r", encoding="utf-8") as f:
config = json.load(f)
model = ST(config['model'])
path="docs_path"
dict={}
#embedding
files=os.listdir(path)
for file_name in files:
with open(os.path.join(path,file_name),"r",encoding="UTF-8") as f:
print(file_name)
url=(f.readlines()[0]) #网址
sentence=f.read() #docs
sentence_embedding=model.encode(sentence).tolist()
dict.update({file_name : [url, sentence_embedding]})
#convert to json
json_str = json.dumps(dict,indent=4)
with open('json_filepath', 'w') as json_file:
json_file.write(json_str)
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化