Evil JavaScript: Webpage- and PDF-Threat

Malicious exploitation of older security vulnerabilities target users of Adobe Reader and Acrobat Professional versions before 8.1.2. The vast majority of todays PDF exploit samples still target an old buffer overflow vulnerability which the Common Vulnerabilities and Exposures project lists as CVE-2007-5659, but Exploits targeting an input validation weakness described in CVE-2008-2641 are also likely to appear soon. Which means: Update your Adobe Reader and Acrobat Professional as soon as possible!

To exploit one of these vulnerabilities, the attacker just needs to execute malicious JavaScript code embedded within a PDF document on the JavaScript virtual machine (VM) of the interpreting reader. Adobe introduced JavaScript as a part of the PDF document specification with PDF 1.3. They equipped the Reader with event handlers that can invoke JavaScript functions embedded in PDF documents. The malicious JavaScript code inside the PDF gets usually triggered by the OpenAction event handler to load it immediately when the document is opened.

The OpenAction event handler is often used in malicious PDF files. Usually it follows right after the PDF Header.

By design, PDF documents should be platform independent, so PDF is not a pure binary format. A PDF document consists of objects that are referenced by an index at the end of the file, the cross reference table. The objects themselves describe what they are and how the data between their tags should be interpreted. Adobe has published the specification for the PDF document format that can be obtained in version 1.7 from Adobe’s PDF developer center

In May ‘08 most malicious PDFs made use of the PDF format’s own security mechanisms to deliver their embedded payload. Today’s PDF exploits renounce this kind of invasion technique and start using obfuscation techniques usually known from web-based malware. Script packers as well as custom obfuscation functions made their way into malicious PDF documents. The challenge for AV vendors is now the extraction of the JavaScript Object and the detection of an exploit attempt in (sometimes) highly obfuscated script code. The relationship from malicious JavaScript code in PDFs to malicious JavaScript Code found in malicious websites makes it possible to face this kind of thread with similar generic/heuristic rules.

Scanning scripts “hidden” in PDFs is as easy as scanning scripts from webpages for Avira because of the modular architecture of the engine. One of the advantages of the new AV8 engine design is the close cooperation of the modules: Prior to the AV8 Engine our HTML heuristics were only able to detect malicious code in websites. Any file can be dissected into it smallest parts, ready for extensive examination by specialized components – the modules – of the engine.

As we explained before, PDF files consist of objects like images, texts and scripts. They can be used as a simple example for this new engine ability and it’s advantages.

  • First, a file is passed to our scan engine for analysis. As it is a PDF file, it may be dangerous.
  • After a first quick scan with patterns of malicious software the central control decides to forward the PDF file to a part of the engine which is able to extract the single parts (like texts, pictures and scripts) from the PDF.
  • Very often the scripts are ZLIB compressed. In these cases the module also decompresses the scripts.

    Zlib-compressed JavaScript object of an exploit PDF from the ElFiesta web-exploitation toolkit.

    The AV8-engine decompresses that object which contains a slightly obfuscated JavaScript exploit.

  • All the extracted parts of the PDF get returned to the central control which is now able to decide how to further process these parts.
  • The scripts are passed to the standard generics detection,
  • After that they are forwarded to the html and script heuristics.
  • The script heuristics was written to detect exploits in web pages and to find drive-by-download attacks before they enter the browser on the users’ PC. And of course it is able to process and judge the probability of malignancy of the embedded script of the PDF.
  • The script heuristics identifies a JavaScript, analyses it and recognizes a shellcode being build as well as an exploit.
  • Sufficient data to alert this as HTML/Shellcode.Gen?

Other PDFs containing malware get detected as “HEUR/HTML.Malware”, “HTML/Rce.Gen”, “HTML/Shellcode.Gen” or “HTML/Crypted.Gen” depending on the exact method the malicious scripts use to hijack the victims computer.

A more detailed insight into the html malware heuristics will be subject of a future article.

Emanuel Somosan
Thorsten Sick
Web-based Malware Team