FAQ: How does Zudello check for duplicate documents?
Zudello employs checks to help prevent accidental processing of the same document multiple times. Duplicate checking primarily occurs in two ways:
-
File Hash Check (On Upload):
- When a file is first uploaded (manually or via email), Zudello calculates a unique fingerprint (a cryptographic hash, like SHA-256) based on the exact content of the file.
- It checks if another file with the identical hash already exists within your team.
- If an identical file hash is found, the newly uploaded document is immediately marked with a Duplicate status and typically requires manual review or deletion.
- Limitation: This only catches exact file duplicates. Even minor changes (like different metadata, resaving the PDF) will result in a different hash, bypassing this initial check.
-
Data-Based Check (During Processing/Viewing):
- After a document has been processed and data extracted, Zudello performs checks based on key data fields when you open or process the document.
- This check typically compares:
- Document Number: (e.g.,
document_number
for invoices/credits,po_number
for POs) - Supplier/Customer: The linked Supplier or Customer record (
supplier_uuid
orcustomer_uuid
). - Document Type: The Module and Submodule (e.g., PURCHASING/INVOICE).
- (Potentially) Amount/Date: Some checks might include total amount or date within a certain tolerance, although Document Number + Supplier/Customer + Type is the most common combination.
- Document Number: (e.g.,
- If Zudello finds another existing document (not already Deleted/Archived) with the same combination of these key fields, it will display a Duplicate Warning Banner on the document you are viewing.
- This check helps catch duplicates even if the original files weren't identical (e.g., the same invoice scanned twice slightly differently).
Key Points:
- The file hash check is immediate but only catches identical files.
- The data-based check is more robust for catching functional duplicates but relies on accurate data extraction and supplier/customer matching.
- Duplicate warnings require user review – Zudello flags potential duplicates but doesn't automatically delete them based on the data check alone.
- You can use the View Duplicate link in the warning banner to compare the documents side-by-side.
See What should I do if I find a duplicate? if you encounter problems.