In 2020, as a pandemic project I rewrote my older QuickSand tool from C into Python and incorporated the features of PDFExaminer into it, dropping the XOR analysis that’s become less common since the early days of document malware. QuickSand allows you to scan within the streams and embedded content of a document or PDF using Yara. QuickSand can also generate similarity hashes for the elements that make up a document, a kind of structural hash.
You can use the online demo version at scan.tylabs.com running as a serverless AWS Lambda container or download the Python code below or use the quicksand command line tool locally
pip3 install quicksand.
QuickSand is based on the combination of two tools that I originally created in 2009. Cryptam and PDFEXaminer that were designed as PHP web-based malware scanning tools to detect malware in phishing documents. At the time, office documents and PDFs had very low detection in commercial AV. Command-line versions are available in their mostly unmaintained form. QuickSand was designed with more of an Analyst focus, automating the extraction of all the streams for further digging while also creating a formal scoring system to present a determination of risk to less technical users.