在本篇文章中,我会对知识图谱进行介绍,并且会演示如何搭建一个知识图谱,这样你也可以跟着这篇博客搭建知识图谱。本文是全系列中的第3/3篇
知识图谱介绍
知识图谱是一种语义化的知识表示形式,它可以将实体、属性和实体之间的关系以图谱的形式呈现出来。知识图谱可以用于自然语言理解、信息检索、智能推荐等领域,是人工智能技术的重要组成部分。
在知识图谱中,实体通常指现实世界中的事物,如人、地点、组织等,而属性则是实体的特征或属性,如人的年龄、地点的经纬度等。实体之间的关系可以分为不同类型,如属于、工作于、是朋友等。通过建立知识图谱,可以将这些实体和关系以结构化的方式进行表达,从而更好地理解和利用这些知识。
总之,知识图谱是一种强大的工具,可以帮助我们更好地理解和利用现实世界中的知识,从而实现更智能化的应用和服务。
知识图谱的构建流程
构建一个知识图谱需要经过以下步骤:
知识抽取
知识抽取是指从非结构化数据中提取出实体、属性和实体间的关系的过程。这一步通常需要使用自然语言处理技术,如实体识别、关系抽取等。
知识表示
知识表示是将抽取出的知识以一定的格式进行表示的过程。常用的表示方式有三元组和RDF等。
知识存储
知识存储是将知识表示存储到数据库中的过程。常用的数据库有图数据库、关系型数据库和文档数据库等。
知识推理
知识推理是指根据已有的知识推导出新的知识的过程。这一步通常需要使用逻辑推理、规则推理等技术。
知识应用
知识应用是指将知识图谱应用到具体的应用场景中的过程。如搜索引擎、智能问答、智能客服等。
知识图谱的搭建大致就是这几个流程,在一个知识图谱项目中,不一定每个步骤都有。比如我接下来的例子中,数据已经以结构化的形式存在数据库中了,不需要做知识抽取的过程,同时知识表示其实就是构建模型的过程。
搭建一个演员电影知识图谱
项目使用 python + py2neo + neo4j + mysql 进行搭建,项目代码
1.数据来源于从零开始构建影视类知识图谱(一)半结构化数据的获取 - 知乎 (zhihu.com),在我的github项目里面也有。
2.数据建模,这里就借用了结构化数据到RDF文件的概念,table对应class,一条记录对应一个实体,记录中的字段对应属性。简化一点,那我们在neo4j数据库中就只有两个类,电影和演员。
mysql使用orm框架sqlalchemy
下面是mysql实体类
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
|
"""实体类文件."""
from sqlalchemy import String, Text, Integer
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column
class Base(DeclarativeBase):
pass
class Actor(Base):
__tablename__ = 'actor'
actor_id: Mapped[int] = mapped_column(primary_key=True)
actor_bio: Mapped[str] = mapped_column(Text)
actor_chName: Mapped[str] = mapped_column(String(100))
actor_foreName: Mapped[str] = mapped_column(String(100))
actor_nationality: Mapped[str] = mapped_column(String(100))
actor_constellation: Mapped[str] = mapped_column(String(100))
actor_birthPlace: Mapped[str] = mapped_column(String(100))
actor_birthDay: Mapped[str] = mapped_column(String(100))
actor_repWorks: Mapped[str] = mapped_column(String(100))
actor_achiem: Mapped[str] = mapped_column(Text)
actor_brokerage: Mapped[str] = mapped_column(String(100))
class Movie(Base):
__tablename__ = 'movie'
movie_id: Mapped[int] = mapped_column(primary_key=True)
movie_bio: Mapped[str] = mapped_column(Text)
movie_chName: Mapped[str] = mapped_column(String(100))
movie_foreName: Mapped[str] = mapped_column(String(100))
movie_prodTime: Mapped[str] = mapped_column(String(100))
movie_prodCompany: Mapped[str] = mapped_column(String(100))
movie_director: Mapped[str] = mapped_column(String(100))
movie_screenwriter: Mapped[str] = mapped_column(String(100))
movie_genre: Mapped[str] = mapped_column(String(100))
movie_star: Mapped[str] = mapped_column(Text)
movie_length: Mapped[str] = mapped_column(String(100))
movie_rekeaseTime: Mapped[str] = mapped_column(String(100))
movie_length: Mapped[str] = mapped_column(String(100))
movie_achiem: Mapped[str] = mapped_column(Text)
class ActorToMovie(Base):
__tablename__ = 'actor_movie_id'
actor_movie_id: Mapped[int] = mapped_column(primary_key=True)
actor_id: Mapped[int] = mapped_column(Integer)
movie_id: Mapped[int] = mapped_column(Integer)
class Genre(Base):
__tablename__ = 'genre'
genre_id: Mapped[int] = mapped_column(primary_key=True)
genre_name: Mapped[str] = mapped_column(String(100))
class MovieToGenre(Base):
__tablename__ = 'movie_genre_id'
movie_genre_id: Mapped[int] = mapped_column(primary_key=True)
movie_id: Mapped[int] = mapped_column(primary_key=True)
genre_id: Mapped[int] = mapped_column(primary_key=True)
|
neo4j使用ogm框架py2neo
neo4j实体类:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
from py2neo.ogm import Model, Property, RelatedFrom, RelatedTo
class Movie(Model):
__primarylable__ = 'Movie'
movie_id = Property()
movie_bio = Property()
movie_chName = Property()
movie_foreName = Property()
movie_prodTime = Property()
movie_prodCompany = Property()
movie_director = Property()
movie_screenwriter = Property()
movie_genre = Property()
movie_star = Property()
movie_length = Property()
movie_rekeaseTime = Property()
movie_length = Property()
movie_achiem = Property()
actors = RelatedFrom("Actor", "ACTED_IN")
class Actor(Model):
# 标签
__primarylable__ = "Actor"
# 属性
actor_id = Property()
actor_bio = Property()
actor_chName = Property()
actor_foreName = Property()
actor_nationality = Property()
actor_constellation = Property()
actor_birthPlace = Property()
actor_birthDay = Property()
actor_repWorks = Property()
actor_achiem = Property()
actor_brokerage = Property()
acted_in = RelatedTo(Movie)
|
图谱构建
从mysql中查询数据并存到neo4j中:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
|
from py2neo import Relationship
from py2neo.ogm import Repository
from sqlalchemy import create_engine
from sqlalchemy import select
from sqlalchemy.orm import Session
from custom_model import neo4j_model, mysql_model
def covertor_actor(mysql_actor: mysql_model.Actor):
neo4j_actor = neo4j_model.Actor()
neo4j_actor.actor_id = mysql_actor.actor_id
neo4j_actor.actor_bio = mysql_actor.actor_bio
neo4j_actor.actor_chName = mysql_actor.actor_chName
neo4j_actor.actor_foreName = mysql_actor.actor_foreName
neo4j_actor.actor_nationality = mysql_actor.actor_nationality
neo4j_actor.actor_constellation = mysql_actor.actor_constellation
neo4j_actor.actor_birthPlace = mysql_actor.actor_birthPlace
neo4j_actor.actor_birthDay = mysql_actor.actor_birthDay
neo4j_actor.actor_repWorks = mysql_actor.actor_repWorks
neo4j_actor.actor_achiem = mysql_actor.actor_achiem
neo4j_actor.actor_brokerage = mysql_actor.actor_brokerage
return neo4j_actor
def covertor_movie(actor_movie: mysql_model.Movie):
neo4j_movie = neo4j_model.Movie()
neo4j_movie.movie_id = actor_movie.movie_id
neo4j_movie.movie_bio = actor_movie.movie_bio
neo4j_movie.movie_chName = actor_movie.movie_chName
neo4j_movie.movie_foreName = actor_movie.movie_foreName
neo4j_movie.movie_prodTime = actor_movie.movie_prodTime
neo4j_movie.movie_prodCompany = actor_movie.movie_prodCompany
neo4j_movie.movie_director = actor_movie.movie_director
neo4j_movie.movie_screenwriter = actor_movie.movie_screenwriter
neo4j_movie.movie_genre = actor_movie.movie_genre
neo4j_movie.movie_star = actor_movie.movie_star
neo4j_movie.movie_length = actor_movie.movie_length
neo4j_movie.movie_rekeaseTime = actor_movie.movie_rekeaseTime
neo4j_movie.movie_length = actor_movie.movie_length
neo4j_movie.movie_achiem = actor_movie.movie_achiem
return neo4j_movie
class ActorMovieKG:
def __init__(self):
self.repo = Repository("bolt://neo4j@127.0.0.1:7687", password="123456")
self.engine = create_engine('mysql+pymysql://root:123456@127.0.0.1:3306/kg')
self.session = Session(self.engine)
def build_graph(self):
self.build_actor()
self.build_movie()
self.build_rel()
def build_movie(self):
stmt_movie = select(mysql_model.Movie)
for actor_movie in self.session.scalars(stmt_movie):
neo4j_movie = covertor_movie(actor_movie)
self.repo.create(neo4j_movie)
def build_actor(self):
stmt_actor = select(mysql_model.Actor)
for actor_mysql in self.session.scalars(stmt_actor):
neo4j_actor = covertor_actor(actor_mysql)
self.repo.create(neo4j_actor)
def build_rel(self):
stmt = select(mysql_model.ActorToMovie)
for element in self.session.scalars(stmt):
actor = self.repo.match(neo4j_model.Actor).where(id=element.actor_id).first()
movie = self.repo.match(neo4j_model.Movie).where(id=element.movie_id).first()
relation_ship = Relationship(actor, "ACTED_IN", movie)
self.repo.create(relation_ship)
if __name__ == '__main__':
kg = ActorMovieKG()
kg.build_graph()
print("建立知识图谱完成")
|
知识图谱使用
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
from py2neo.ogm import Repository
from custom_model.neo4j_model import Actor
class KGClient:
def __init__(self):
self.repo = Repository("bolt://neo4j@127.0.0.1:7687", password="123456")
def build_graph(self):
cypher = """CALL gds.graph.project(
'ActorMovieGraph',
['Actor', 'Movie'],
'ACTED_IN'
);
"""
self.repo.graph.run(cypher)
def query_similarity(self, name: str):
"""找出演员中的紧密程度"""
cypher = f"""CALL gds.nodeSimilarity.stream('ActorMovieGraph')
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS actor1, gds.util.asNode(node2) AS actor2, similarity
WHERE actor1.actor_chName = '{name}'
RETURN actor1.actor_chName, actor2.actor_chName, similarity
ORDER BY similarity DESCENDING
"""
result = self.repo.graph.run(cypher).data()
print(result)
def query_all_movies(self, name: str):
actor = self.repo.match(Actor).where(
actor_chName=name
).first()
if actor is None:
raise ValueError('input error!')
for movie in actor.acted_in:
print(movie.movie_chName)
if __name__ == '__main__':
kg_client = KGClient()
# kg_client.build_graph()
# kg_client.query_all_movies("张家辉")
kg_client.query_similarity("鲍方")
|
结果展示
图谱展示:
查询和鲍方合作的明星,目前actor_to_movie表中的数据太少了,建立的联系不多,只查询出来2个

代码地址
ActorMovieKG
参考资料