Skip to content

Extract Key-Values from PDF

The key_values_from_pdf() function extracts potential key-value pairs from a PDF document. These key-value pairs can then be used to populate templates with dynamic content.

Parameters:

  • filename: The name of the PDF file. This will be used to identify the document.
  • file: The file object of the PDF file to extract key-value pairs from. Ensure that the file is opened in read mode ("rb") for binary files.

Returns:

  • text: A dictionary containing potential key-value pairs extracted from the PDF document. The keys represent the extracted keys, and the values represent the extracted values. If a value cannot be extracted for a key, it will be set to None.

Usage:

Open the PDF file in read mode and binary mode ("rb"). Use the key_values_from_pdf function of the Blipper object, passing the filename and file object as parameters. The function will return a dictionary containing the extracted key-value pairs from the PDF document.

Example:

1
2
3
4
5
6
7
8
from blipper import Blipper

blipper = Blipper(api_key="my_api_key")

with open("files/user.pdf", "rb") as file:
    text: dict = blipper.key_values_from_pdf(filename="user.pdf", file=file)

print(text)  # {"name":  "John Doe", "age": 26,  "country": None}

In this example, key-value pairs are extracted from the PDF document named user.pdf. The filename parameter specifies the name of the file as it will be identified in Blipper, while the file parameter represents the opened file object in binary mode ("rb"). The extracted key-value pairs are returned as a dictionary, where the keys represent the extracted keys, and the values represent the extracted values. If a value cannot be extracted for a key, it will be set to None.

Next Steps: Populate Template with Extracted Values

Once you have extracted the key-value pairs from the PDF document using the key_values_from_pdf function, the next step is to populate the template with these extracted values. Stay tuned for the next section where we'll explore how to fill the template with the extracted values from the dictionary.