How to Read PDF Files as Tables in Spotfire

Product: Spotfire

Keywords: Spotfire, PDF, table extraction, Python, DataFrame, TERR, R, read_pdf, data import

Description:
This article provides guidance on how to import tables from PDF files directly into Spotfire using Python data functions. Alternatively, similar functionality can be achieved using TERR/R data functions with R packages.

Resolution:

Using Python Data Functions:

To extract tables from a PDF file and import them as a DataFrame in Spotfire, you can use the
tabula-py package. Below is a minimalistic example of the Python code you can use:

from tabula import read_pdf
import pandas as pd

# Specify the path to your PDF file
pdf_path = r"C:\Path\To\Your\File.pdf"

# Reads tables from the PDF file and returns a list of DataFrames
df_list = read_pdf(pdf_path, pages="all", multiple_tables=True)

# Combine all tables into a single DataFrame
Final_df = pd.concat(df_list, ignore_index=True)

This code reads tables from a PDF file, combines them into a single DataFrame, and loads data in Spotfire as Table.

Using TERR/R Data Functions:

Alternatively, you can achieve similar results using TERR/R data functions in Spotfire by leveraging R packages like pdftools or tabulizer. Below is a simple example of how to do this with tabulizer in R:

library(tabulizer)
library(dplyr)

# Specify the path to your PDF file
pdf_path <- "C:/Path/To/Your/File.pdf"

# Extract tables from the PDF
df_list <- extract_tables(pdf_path)

# Combine all tables into a single DataFrame
Final_df <- bind_rows(lapply(df_list, as.data.frame))

This R code reads tables from a PDF file and combines them into a single DataFrame as table in Spotfire.

Note: Ensure the required Python or R packages (tabula-py, pdftools, tabulizer, etc.) are installed in your Spotfire environment.

Additional Note: Please find the attached dashboard (DXP file) with sample PDFs using Python data functions as an example.

PDF_Reader.dxp
400 KB Download
Test.pdf
500 KB Download
Financial Sample - Sheet1.pdf
20 KB Download

Your Notifications (0)

How to Read PDF Files as Tables in Spotfire

Comments

Related articles