PDFiD is a python module that can analyze and sanitize PDF files. PDF files can be embedded with malicious code that can run on the user’s system, Eg. Javascript. This tool is written by Didier Stevens. We are going to analyze a simple PDF file and a malicious PDF file and also a normal exe file that has been converted to pfd extension. This tool can be very helpful in verifying if a PDF file is malicious or not.


PDF Terminologies:

PDF (Portable Document Format) Terms:

AA : an additional actions dictionary defining a field’s behavior in response to trigger events

AcroForm : PDF files interactive form dictionary

endobj : specifies the end of a object in a PDF file

endstream : the end marker of a stream object in a PDF file

JavaScript : javascript dictionary containing javascript scripts

JBIG2Decode : decompresses data encoded using the JBIG2 standard

JS : a text string or stream containing JavaScript that will be executed when the action is triggered

Launch : launch an application which usually opens a file

obj : the beginning of a object in a PDF file

ObjStm : object stream

OpenAction : destination that shall be displayed or action that will be performed when PDF is opened

RichMedia : interactive PDF elements

startxref : follows trailer keyword and is offset of the cross-reference stream

stream : the beginning marker of a stream object PDF file

trailer : provides a method to quickly find a cross-reference table and certain special objects

xref : notes a cross-reference section in a PDF file


How to create a malicious PDF:

I have included a folder called make PDF where you can convert a normal PDF to a malicious PDF:

The following commands can be used:

make-pdf-javascript.py allows one to create a simple PDF document with embedded JavaScript that will execute upon opening of the PDF document.

  • make-pdf-javascript.py [options] pdf-file

make-pdf-embedded.py creates a PDF file with an embedded file.

  • make-pdf-embedded.py [option] pdf-to-embed pfd-file

PDFiD Analysis:

I have taken a simple PDF document of the internet to analyze the tool. Pfd_white has not been edited in any way, but pdf_black is embedded with malicious script.

The pdf can be analyzed by running the script:

  • pdfid.py pdf_black.pdf
  • pdfid.py pdf_white.pdf

pdf_white: has no malicious script

pdf_black: has 1 JS 1 javascript and 1 open action


PDF Parser:

This tool will parse a PDF document to identify the fundamental elements used in the analyzed file. It will not render a PDF document.


  • pdf-parser.py --stats [pdf]
  • pdf-parser.py --search javascript [pdf]
  • pdf-parser.py --search javascript --raw [pdf]

Provided for the sample challenge is the two PDFs, black and white, that resemble what was used in the powerpoint. Also, the following Python scripts are included:

-Create a PDF with embedded Javascript that executes upon the opening of the PDF (make-pdf-javascript.py)

-Create a PDF with an embedded file (make-pdf-embedded.py)

- Scripts to run PDFiD to analyze malicious PDF's (pdfid.py)

- Scripts to run PDF Phraser to identify the fundamental elements used in the analyzed file (pdf-parser.py)

make-pdf-embedded.py(4.78 KB) meghancaiazzo, Jun 4 2013, 12:17 AM
make-pdf-javascript.py(3.05 KB) meghancaiazzo, Jun 4 2013, 12:17 AM
pdf-parser.py(40.01 KB) meghancaiazzo, Jun 4 2013, 12:18 AM
pdf_black.pdf(4.97 KB) meghancaiazzo, Jun 4 2013, 12:02 AM
pdf_white.pdf(207.23 KB) meghancaiazzo, Jun 4 2013, 12:02 AM
pdfid.py(28.56 KB) meghancaiazzo, Jun 4 2013, 12:17 AM

You must Sign-In to post a comment.