请教一下kag抽取实体关系是严格按照schema定义来的吗？ #93

jerryHo123 · 2024-12-04T08:50:14Z

自定义schema,commit后，抽取结果出现了不在schema定义中的实体类型和关系类型

thundax-lyp · 2024-12-04T08:58:33Z

参考
https://github.com/OpenSPG/KAG/blob/master/kag/examples/musique/builder/prompt/ner.py
可以在 parse_response 中根据自己的schema过滤掉不符合的类型

jerryHo123 · 2024-12-04T09:22:48Z

参考 https://github.com/OpenSPG/KAG/blob/master/kag/examples/musique/builder/prompt/ner.py 可以在 parse_response 中根据自己的schema过滤掉不符合的类型

这个只能枚举过滤吧，能过滤掉所有不匹配schema的类型吗

jerryHo123 · 2024-12-04T10:00:16Z

参考 https://github.com/OpenSPG/KAG/blob/master/kag/examples/musique/builder/prompt/ner.py 可以在 parse_response 中根据自己的schema过滤掉不符合的类型

而且这个是prompt reponse过滤的，我需要的是在图谱构建时就过滤哦

thundax-lyp · 2024-12-04T10:00:18Z

    def __init__(
            self, language: Optional[str] = "en", **kwargs
    ):
        super().__init__(language, **kwargs)
        self.schema = SchemaClient(project_id=self.project_id).extract_types()
        self.template = Template(self.template).safe_substitute(schema=self.schema)

self.schema 里是前面提交给openspg的类型

jerryHo123 · 2024-12-05T03:01:00Z

好的，我看看，顺便再请教下，build下的prompt目录以及其下的ner.py,std.py的作用是干啥呀，看源码没找到在哪调用的

thundax-lyp · 2024-12-05T05:01:23Z

在/kag/builder/component/extractor/kag_extractor.py中，KAGExtractor.__init__里，通过PromptOp.load()动态加载。

        self.ner_prompt = PromptOp.load(self.biz_scene, "ner")(
            language=self.language, project_id=self.project_id
        )

在 KAGExtractor.invoke里可以看到，先用named_entity_recognition使用ner提取entity，这里提取出的entity的名称可能会模糊不清，然后在 named_entity_standardization里使用std做消歧，提取出消歧后的official_name作为entity的最终名。

同样，在solver中，对问题也会进行提取->消歧的处理。

thundax-lyp · 2024-12-05T05:04:15Z

顺便说下，从提示词里看，KAG使用模型内知识帮助补充了office_name，如果是行业内特殊名词，模型可能会不知道office_name，这时可以通过词典强行替换，也可以用finetune后的行业模型做处理

jerryHo123 · 2024-12-05T06:28:22Z

在/kag/builder/component/extractor/kag_extractor.py中，KAGExtractor.__init__里，通过PromptOp.load()动态加载。
        self.ner_prompt = PromptOp.load(self.biz_scene, "ner")(
            language=self.language, project_id=self.project_id
        )
在 KAGExtractor.invoke里可以看到，先用named_entity_recognition使用ner提取entity，这里提取出的entity的名称可能会模糊不清，然后在 named_entity_standardization里使用std做消歧，提取出消歧后的official_name作为entity的最终名。

同样，在solver中，对问题也会进行提取->消歧的处理。

那如果我新建一个example项目的话，还得再建prompt目录以及自定义ner,std文件来做提取和消歧吧？

thundax-lyp · 2024-12-05T06:55:03Z

在/kag/builder/component/extractor/kag_extractor.py中，KAGExtractor.__init__里，通过PromptOp.load()动态加载。
        self.ner_prompt = PromptOp.load(self.biz_scene, "ner")(
            language=self.language, project_id=self.project_id
        )
在 KAGExtractor.invoke里可以看到，先用named_entity_recognition使用ner提取entity，这里提取出的entity的名称可能会模糊不清，然后在 named_entity_standardization里使用std做消歧，提取出消歧后的official_name作为entity的最终名。
同样，在solver中，对问题也会进行提取->消歧的处理。
那如果我新建一个example项目的话，还得再建prompt目录以及自定义ner,std文件来做提取和消歧吧？

不需要，一般默认的就够了。默认不满足的情况下，才需要自己重写。也可以不使用extractor，直接从其他KG系统里取关系出来。
在 https://github.com/OpenSPG/openspg 的系统结构图里可以看到，builder只是把数据弄进openspg里，至于数据怎么来的，可以根据自己的数据自己定义，最终组合成KGWriter需要的结构推给他就可以了

pecanjk · 2024-12-06T03:30:28Z

可以不经过std.py这一步么？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请教一下kag抽取实体关系是严格按照schema定义来的吗？ #93

请教一下kag抽取实体关系是严格按照schema定义来的吗？ #93

jerryHo123 commented Dec 4, 2024

thundax-lyp commented Dec 4, 2024

jerryHo123 commented Dec 4, 2024

jerryHo123 commented Dec 4, 2024

thundax-lyp commented Dec 4, 2024

jerryHo123 commented Dec 5, 2024

thundax-lyp commented Dec 5, 2024

thundax-lyp commented Dec 5, 2024

jerryHo123 commented Dec 5, 2024

thundax-lyp commented Dec 5, 2024

pecanjk commented Dec 6, 2024

请教一下kag抽取实体关系是严格按照schema定义来的吗？ #93

请教一下kag抽取实体关系是严格按照schema定义来的吗？ #93

Comments

jerryHo123 commented Dec 4, 2024

thundax-lyp commented Dec 4, 2024

jerryHo123 commented Dec 4, 2024

jerryHo123 commented Dec 4, 2024

thundax-lyp commented Dec 4, 2024

jerryHo123 commented Dec 5, 2024

thundax-lyp commented Dec 5, 2024

thundax-lyp commented Dec 5, 2024

jerryHo123 commented Dec 5, 2024

thundax-lyp commented Dec 5, 2024

pecanjk commented Dec 6, 2024