Develop a comprehensive dataset containing images of various Chinese bills (receipts and invoices). This dataset aims to accelerate advancements in Optical Character Recognition (OCR) models tailored for financial documents in China, assist in expense tracking apps, and facilitate tax compliance automation.
Compile images of different types of bills, including restaurant receipts, shopping invoices, utility bills, and more. Each bill will be annotated with its categories, total amount, date, and itemized details when applicable.
Automated OCR Verification: Early-stage OCR models help in validating the extracted data against the annotations.
Peer Review: A secondary set of annotators inspects a subset of the bills for consistency and accuracy.
Inter-annotator Agreement: Certain bills are annotated by multiple reviewers to ensure agreement and consistency in data extraction.
The Chinese Bill Dataset provides a robust foundation for models and apps targeting financial document recognition and data extraction in China. With its extensive coverage of various bill types and meticulous annotations, this dataset serves as a catalyst for technological innovations in personal finance, business expense management, and regulatory compliance.
To get a detailed estimation of requirements please reach us.