Skip to main content
Version: Current

FAQ: How does Zudello check for duplicate documents?

Zudello employs checks to help prevent accidental processing of the same document multiple times. Duplicate checking primarily occurs in two ways:

  1. File Hash Check (On Upload):

    • When a file is first uploaded (manually or via email), Zudello calculates a unique fingerprint (a cryptographic hash, like SHA-256) based on the exact content of the file.
    • It checks if another file with the identical hash already exists within your team.
    • If an identical file hash is found, the newly uploaded document is immediately marked with a Duplicate status and typically requires manual review or deletion.
    • Limitation: This only catches exact file duplicates. Even minor changes (like different metadata, resaving the PDF) will result in a different hash, bypassing this initial check.
  2. Data-Based Check (During Processing/Viewing):

    • After a document has been processed and data extracted, Zudello performs checks based on key data fields when you open or process the document.
    • This check typically compares:
      • Document Number: (e.g., document_number for invoices/credits, po_number for POs)
      • Supplier/Customer: The linked Supplier or Customer record (supplier_uuid or customer_uuid).
      • Document Type: The Module and Submodule (e.g., PURCHASING/INVOICE).
      • (Potentially) Amount/Date: Some checks might include total amount or date within a certain tolerance, although Document Number + Supplier/Customer + Type is the most common combination.
    • If Zudello finds another existing document (not already Deleted/Archived) with the same combination of these key fields, it will display a Duplicate Warning Banner on the document you are viewing.
    • This check helps catch duplicates even if the original files weren't identical (e.g., the same invoice scanned twice slightly differently).

Key Points:

  • The file hash check is immediate but only catches identical files.
  • The data-based check is more robust for catching functional duplicates but relies on accurate data extraction and supplier/customer matching.
  • Duplicate warnings require user review – Zudello flags potential duplicates but doesn't automatically delete them based on the data check alone.
  • You can use the View Duplicate link in the warning banner to compare the documents side-by-side.

See What should I do if I find a duplicate? if you encounter problems.