增值税发票全票面结构化识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.1

基金项目:

梅州市烟草专卖局(公司)科技项目资助(2023441400240048);


Full-ticket Structural Recognition of VAT Invoice
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    增值税发票商品明细部分的项目名称、规格型号等的格式和内容非常灵活复杂,且缺乏完整表格线对各信息字段进行分隔,现有方法对增值税发票进行全票面信息结构化识别还存在元素识别率低、计算复杂度过高等问题,提出一种基于计算机形态学的全票面信息结构化识别方法。该方法采用形态学操作检测发票表格线,对发票不同区域裁切并识别文字;再利用增值税发票商品明细区域版面排布隐含规则,结合计算机形态学操作获得的文字连通区域,构建完整表格结构;最后基于DBNet和CRNN实现文本的检测和识别。提出的方法在3种版式共49张增值税发票数据集上测试,元素识别率分别达到99.9%、97.4%和98.8%,单张平均运行时间分别为0.90s、0.47s和0.82s,全票面结构化识别性能超过多个对照表格识别模型以及文献方法。

    Abstract:

    The format and content of items such as product names and specifications in the detailed section of VAT invoices are highly flexible and complex, lacking complete gridlines to separate information fields. Existing methods for all-element structural recognition of VAT invoices face issues like low element recognition rates and high computational complexity. A new method is proposed, based on computer morphological operations, for the structural recognition of full invoice information. This method employs morphological operations to detect invoice gridlines, crops and identifies text in different regions of the invoice. It then leverages the implicit layout rules of the commodity details section in VAT invoices, combines this with the text-connected areas obtained through morphological operations, to construct a complete table structure. Lastly, DBNet and CRNN are utilized for text detection and recognition. Tested on a dataset of 49 VAT invoices across three formats, the proposed method achieved element recognition rates of 99.9%, 97.4%, and 98.8% respectively, with average processing times per invoice of 0.90 seconds, 0.47 seconds, and 0.82 seconds. The full-ticket recognition performance of proposed method surpasses multiple comparative table recognition models and methods reported in literature.

    参考文献
    相似文献
    引证文献
引用本文

贺锋,张威,杨玉燕,等. 增值税发票全票面结构化识别[J]. 科学技术与工程, 2025, 25(9): 3788-3794.
He Feng, Zhang Wei, Yang Yuyan, et al. Full-ticket Structural Recognition of VAT Invoice[J]. Science Technology and Engineering,2025,25(9):3788-3794.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-03
  • 最后修改日期:2025-03-17
  • 录用日期:2024-07-09
  • 在线发布日期: 2025-04-01
  • 出版日期:
×
喜报!《科学技术与工程》入选国际著名数据库《工程索引》(EI Compendex)!