TL;DR Adding QR Code labels with a unique serial number to scanned papers, matching paper originals and paperless version of documents.

Updates

How can a workflow for handling paper documents with paperless-ngx look like?

paperless-ngx is a pretty good tool for handling digital and scanned documents, which is under active development by the paperless community. Recently, some cool new features regarding the handling of paper originals were added.

The problem with scanned originals is that you now have two “sources of truth”: The document in paperless-ngx with all the attached metadata, correctly categorized and easy accessible, and the “analog” representation you do not want to throw away, without any metadata attached. How do you link those two and keep an overview?

That’s where the ASN (Archive Serial Number) from paperless-ngx comes in: Each document with a corresponding physical paper original in your archive gets a unique serial number in your paperless. You (hopefully) have your paper originals ordered by their ASN (e.g. in a case binder), so you can quickly find the physical original of your document from paperless.

How does my workflow look like?

I’ve got a Brother ADS1700-W scanner, configured to push scanned pages as PDF via E-Mail to my Paperless instance. Paperless is configured to look for barcodes while importing the PDF.

In general, there are two types of paper documents: If I don’t want to keep the original document, it is relatively easy. Just scan it, and throw it away.

If I scan multiple non-keep documents in a single batch, I insert a PATCH-T sheet after every document, so paperless can split them up.

However, if I want to keep the document, it gets a bit more complicated: I have a sheet of pre-printed ASN label stickers lying around next to the scanner, so every first sheet of a document gets the next sticker. Paperless also splits the document after finding a new ASN sticker, and inserts the ASN number (if not duplicate) into the metadata.

Bunch of ASN Labels in front of a brother scanner

Bunch of ASN Labels in front of a brother scanner

After scanning, the documents are inserted into a case binder, ordered by ASN number.

Generating and Printing Labels

But how to generate the labels? I decided to go with 189 Labels per Sheet, the Avery 4731 ones, as they are small enough to fit on nearly every letter, but large enough to hold the ASN information as QR Code together with the human-readable code.

Update November 2023

As the barcode sheet generation was quite a hassle, a python-based CLI Tool was created by jcgruenhage (Git Link).

Currently, it supports creating full sheets of Avery 4731 Sheets, e.g. using pipx:

~ pipx run paperless-asn-qr-codes
usage: paperless-asn-qr-codes [-h] start_asn output_file
paperless-asn-qr-codes: error: the following arguments are required: start_asn, output_file
~ pipx run paperless-asn-qr-codes 1 barcode.pdf

There is also a discussion if upstream paperless should support label generation natively.

An example file (Avery 4731, DIN A4) can be found here for your first steps. There is also a variant with borders, which might help to align your printer settings.

Original Guide

I found this reportlab based python script, which is able to generate a PDF according to the label sizes with a convenient wrapper.

You just need to add the 4731 Layout to the label info dict:

labelInfo = {
  ...
  4731: (7, 27, (25.4*mm, 10*mm), (0.25*cm, 0), (0.85*cm, 1.35*cm) )
}

To create your labels, just use the class and fill every label with the QR Code and value for an ASN:

import AveryLabels
from reportlab.lib.units import mm, cm
from reportlab_qrcode import QRCodeImage

startASN = 190

def render(c,x,y):
    global startASN
    barcode_value = f"ASN{startASN:05d}"
    startASN = startASN + 1
    
    qr = QRCodeImage(barcode_value, size=y*0.9)
    qr.drawOn(c,1*mm,y*0.05)
    c.setFont("Helvetica", 2*mm)
    c.drawString(y, (y-2*mm)/2, barcode_value)
    print(x)
    print(y)


label = AveryLabels.AveryLabel(4731)
label.open( "labels4731.pdf" )
label.render(render, 189 )
label.close()

If not already done, you need to manually install the dependencies (reportlab and reportlab_qrcode) using pip.

After a few hours fighting against printer accuracy, I have got a few sheets of ASN stickers correctly aligned to the label cutouts:

Label Sheets with ASN Barcode

Label Sheets

For the next sheet, I just needed to increase the startASN variable, and rerun the script.

I now have scanned originals with a barcode sticker on top:

Example Letter with ASN Barcode

Example letter with ASN barcode

Preparing Paperless

During my manual tests using a zxing based QR-Code reader, reading the printed QR Codes went well. However, it turns out that Barcode Reader used by paperless-ngx (pyzbar), did not recognize the small QR-Code in the scan, while zxing does.

So I’ve patched paperless with optional support for zxing, which is able to recognize the QR-Codes in the scans without problems.

This is available from paperless-ngx version paperless-ngx v1.14.0, currently only on X86/64 platforms!

Additionally, paperless needs some config options set:

PAPERLESS_CONSUMER_ENABLE_BARCODES=true # enable search for barcodes
PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE=true # enable setting ASN by ASN barcodes
PAPERLESS_CONSUMER_BARCODE_SCANNER=ZXING # switch from pyzbar to zxing for better recognition

Now paperless does the magic: When it detects a QR Code whose value is prefixed with “ASN”, it splits the document so that the QR Code is on the first page, and copies the Numeric Value into the ASN field.

This allows you to easily scan and archive a batch of paper documents.

Conclusion and Outlook

Scanning the document from above, I now have this example letter in my paperless inbox (waiting for Tags etc.), with the Archive Serial Number of 58 already set.

Screenshot of the example letter inside paperless. Inbox Tag and ASN 58 is set.

Screenshot of the example letter inside paperless

Overall, this workflow works really well (especially with the other paperless features), and enables organizing even years of paper trail with relatively low effort.

To successfully work with this, you need a Hardware Scanner (see paperless wiki), which, in my opinion, is well worth the expense.

An issue still open for me is the “catastrophic failure” scenario: If the “originals get destroyed”, e.g. due to fire/water, I’m covered by using paperless with an off-site backup.

But what if the digital variant gets destroyed (or is just broken when needed)? I have a “chaotic” originals storage, with the sorting/mapping entirely done inside paperless, so I’d have to spend a lot of time manually searching. To solve this, a printed mapping Doc → ASN from time to time or some different blocks of ASN Numbers with well known meanings might be an option, but that’s a problem for future me.