0

How to Load PDF file to SAP HANA

As explained in the following tutorials.

Creating tunel to SAP HANA Cloud Platform

Creating ODBC datasource to SAP HANA Cloud Platform

Python to Load Binary Documents to SAP HANA


We will now upload the unstructured data into SAP HANA.

Create table in SAP HANA

First create the table in SAP HANA. Replace the SCHEMA with your SCHEMA name.

CREATE COLUMN TABLE <"SCHEMA_NAME">."PDFTEST"

(

ID INTEGER PRIMARY KEY,

STRING BLOB

);

2-15-2016 5-01-28 PM

Right click > Open content. No data found in the table. Now we will see how to load the pdf data via Python code.

2-15-2016 5-23-05 PM

Copy the following python code

Replace the ServerNode, SERVER DB, UID , PWD with your Server DB, Username & Password.

Replace SCHEMA NAME & give the pdf file location.

import pyodbc
#Open Connection to HANA
conn = pyodbc.connect('DRIVER={HDBODBC};SERVERNODE=localhost:30115;SERVERDB=hanadbs;UID=ITSTUFFS;PWD=ITSTUFFS')
#Open a cursor
cur = conn.cursor()
#Open file in read-only and binary
file = open('c:/pdftest.pdf', 'rb') 
# Save the content of the file in a variable
content = file.read()
#Save the content to the table
cur.execute("INSERT INTO NEO_90PWY22OTOLA96KL8HTRM3EJ5.PDFTEST VALUES( ?,?)", ('1',content))
#Save the content to the table
cur.execute("COMMIT")
# Close the file
file.close()
# Close the cursor
cur.close()
# Close the connection
conn.close()


Copy the program in notepad and Save the file with the extension pdftest.py

Right click on the file and Open the file with the Python editor (IDLE)

2-15-2016 5-21-18 PM

Click on run > Run Module in the python shell.

2-15-2016 5-23-39 PM

You we get the below screen after execution.

2-15-2016 5-30-17 PM

Now go to SAP HANA and refresh the table PDFTEST. You will find the successfully uploaded pdf data.

2-15-2016 5-40-21 PM


We can also perfom further analysis on this unstructured data by Text Analysis feature in SAP HANA. I will cover up Text Analysis, Full text Index, Fuzzy search in upcoming tutorials.

Post a Comment

 
Top