加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
main code 1.43 KB
一键复制 编辑 原始数据 按行查看 历史
fanbiyang 提交于 2024-04-11 08:46 . update main code.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
# Load data
data = pd.read_csv('ml-100k/u.data', sep='\t', header=None)
data.columns = ['user_id', 'item_id', 'rating', 'timestamp']
# Split data into train and test sets
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)
# Create user-item matrix
num_users = data['user_id'].nunique()
num_items = data['item_id'].nunique()
user_item_matrix = np.zeros((num_users, num_items))
for row in train_data.itertuples():
user_item_matrix[row[1]-1, row[2]-1] = row[3]
# Calculate item-item similarity matrix
item_sim_matrix = cosine_similarity(user_item_matrix.T)
# Predict ratings for test set
test_data['predicted_rating'] = 0
for row in test_data.itertuples():
user_id = row[1]-1
item_id = row[2]-1
user_items = user_item_matrix[user_id]
item_similarities = item_sim_matrix[item_id]
relevant_items = np.where(user_items > 0)[0]
if len(relevant_items) > 0:
predicted_rating = np.dot(user_items[relevant_items], item_similarities[relevant_items]) / np.sum(item_similarities[relevant_items])
else:
predicted_rating = np.mean(user_item_matrix[user_id])
test_data.at[row[0], 'predicted_rating'] = predicted_rating
# Evaluate performance
mse = np.mean((test_data['rating'] - test_data['predicted_rating'])**2)
print('Mean squared error:', mse)
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化