基于预训练语言模型和TRIZ 发明原理的专利分类方法研究
DOI:
作者:
作者单位:

中国民航大学

作者简介:

通讯作者:

中图分类号:

TP391.1

基金项目:

中央高校基本科研业务费专项资助(3122022052)


Research on Patent Classification Method Based on Pre-trained Language Model and TRIZ Inventive Principle
Author:
Affiliation:

Civil Aviation University of China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为充分挖掘专利文本中已有的解决方案和技术知识,依据发明问题解决理论(TRIZ),提出一种基于预训练语言模型的方法,将其用于面向TRIZ发明原理的中文专利分类研究中。基于整词掩码技术,使用不同数量的专利数据集(标题和)对中文RoBERTa模型进一步预训练,生成特定于专利领域的RoBERTa_patent1.0和RoBERTa_patent2.0两个模型,并在此基础上添加全连接层,构建了基于RoBERTa、RoBERTa_patent1.0和RoBERTa_patent2.0的三个专利分类模型。然后使用构建的基于TRIZ发明原理的专利数据集对以上三个分类模型进行训练和测试。实验结果表明,RoBERTa_patent2.0_IP具有更高的准确率、宏查准率、宏查全率和宏F1值,分别达到96%、95.69%、94%、94.84%,实现了基于TRIZ发明原理的中文专利文本自动分类,可以帮助设计者理解与应用TRIZ发明原理,实现产品的创新设计。

    Abstract:

    To fully explore the existing solutions and technical knowledge in patent texts,based on the Theory of Inventive Problem Solving(TRIZ),a method based on pre-trained language models is proposed for Chinese patent classification research oriented towards TRIZ inventive principles. Due to the lack of an open patent database classified according to inventive principle,this study firstly constructed a patent datasets based on TRIZ inventive principle. Secondly,based on the Whole Word Masking technology(WWM), the Chinese RoBERTa model is further pre-trained with different number of patent datasets(composed of title and abstract of patent),and RoBERTa_patent1.0 and RoBERTa_patent2.0 models specific to the patent domain are generated.On this basis, a Fully Connected Layer was added to construct three patent classification models based on RoBERTa, RoBERTa_patent1.0 and RoBERTa_patent2.0. Then, the constructed patent datasets based on TRIZ inventive principle was used to train and test the above three patent classification models. The experimental results showed that ,RoBERTa_patent2.0_IP has higher accuracy, macro_P, macro_R, and macro_F1, reaching 96%, 95.69%, 94%, and 94.84% respectively, achieving automatic classification of Chinese patent texts based on TRIZ inventive principle and helping designers understand and apply TRIZ inventive principle and achieve innovative product design.

    参考文献
    相似文献
    引证文献
引用本文

贾丽臻,白晓磊. 基于预训练语言模型和TRIZ 发明原理的专利分类方法研究[J]. 科学技术与工程, , ():

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-11-11
  • 最后修改日期:2024-03-27
  • 录用日期:2024-04-01
  • 在线发布日期:
  • 出版日期:
×
亟待确认版面费归属稿件,敬请作者关注