Linux Malware Analysis—Why Homebrew Encryption is Bad
by Jacob Pimental
Linux is one of my favorite operating systems, but you seldom see malware for it, so I was pretty interested when Linux Malware was caught by my honeypot. This article will be my analysis of the sample, particularly the decryption function that was used throughout it. It’s a good example of why using your own encryption algorithm isn’t very secure.
Like with any analysis, I first toss the file into VirusTotal to see what is going on:
We can see here that only 34 out of 59 vendors identified this malware, not very surprising given that it’s a Linux binary. The binary is not packed and there isn’t anything extremely interesting in the VirusTotal analysis. The next thing to do is run it through rabin2 to get some basic information about it.
arch x86 binsz 646674 bintype elf bits 32 canary false class ELF32 crypto false endian little havecode true lang c linenum true lsyms true machine Intel 80386 maxopsz 16 minopsz 1 nx true os linux pcalign 0 pic false relocs true rpath NONE static true stripped false subsys linux va true
Again, nothing too interesting. This is a Linux binary that was coded in the C programming language. It starts to get interesting once you analyze the exports using the -E flag in rabin2.
[Exports] 956 0x00020070 0x08068070 GLOBAL FUNC 1365 __vsyslog_chk 957 0x00073c00 0x080bbc00 GLOBAL OBJECT 36 _nl_C_LC_CTYPE 959 0x00021790 0x08069790 GLOBAL FUNC 8 __stack_chk_fail_local 965 0x0008b9e8 0x080d49e8 GLOBAL OBJECT 4 __morecore 966 0x0001fc80 0x08067c80 GLOBAL FUNC 41 __getdtablesize 967 0x00014f80 0x0805cf80 GLOBAL FUNC 40 _IO_remove_marker 969 0x00009090 0x08051090 GLOBAL FUNC 291 __libc_sigaction 970 0x00051ef0 0x08099ef0 GLOBAL FUNC 69 __isnanl 971 0x00042ae0 0x0808aae0 GLOBAL FUNC 170 __libc_pread 974 0x0001db60 0x08065b60 GLOBAL FUNC 34 strcpy 975 0x0003c460 0x08084460 GLOBAL FUNC 200 _IO_wdefault_xsgetn 976 0x00012080 0x0805a080 GLOBAL FUNC 9 __fcloseall 977 0x00020630 0x08068630 GLOBAL FUNC 44 __syslog 978 0x00000ba3 0x08048ba3 GLOBAL FUNC 74 V8ULRand 979 0x0000e600 0x08056600 GLOBAL FUNC 234 __setstate_r 980 0x0006ce50 0x080b4e50 GLOBAL FUNC 213 _dl_vsym
You can see that all of the object and function names are defined for us. This was probably used while the author was debugging their malware. Now we won’t have to deal with radare2 creating names for functions since it already knows that they all are. Now we can pop the binary into radare2 and seek to the main function.
One of the first things I noticed was this snippet of code:
Not only does it show in the comments that the string is called strHost, it also shows that the name of the function that strHost is being passed into is called DecryptData. We can guess now that strHost probably holds some sort of hostname or address that the malware might connect to, and that DecryptData will decrypt this string so it will be usable, as opposed to what it is now, ‘yy123-e4213-mfs’. Next we can jump into DecryptData and see what that looks like.
To start off, we can see that the value 0 is moved into a counter variable (keep in mind, I renamed all of the variables in this function using the afvn command, this makes it a lot easier for me to analyze). It then starts the loop by checking to see if the counter variable is less than the string’s length that was provided as an argument. If it is then we go to the main logic of our loop.
This may seem pretty daunting for newer analysts but I’ll walk through what is going on. The program passes the current index of the string we are on to another counter variable (do not ask me why, because this could have been done without doing that). It then declares a new variable with the hex value of 0x55555556. This value is what is called a “magic number” and is used in improving the performance in division. This specific magic number comes out to 1431655766. If we multiply a number by this and then do a logical shift right by 32 bits, then it is the same result as dividing by 3! I will do my best to try to explain this, as it is the first I have heard about it but it is actually very interesting.
The example we will use for this is 6 divided by 3 which we all know is 2. Now in this case we would do 6 * 1431655766 which comes out to 8589934596, or in binary:
Now if we shift all of the bits to the right 32 spaces we get:
Which comes out to 2! It is a very interesting trick and teaches you a lot about how computer architecture works. It is dividing without having to divide! For more info on this I suggest reading this article.
So what the application is doing in essence is running through the counter and setting a variable to either 0, 1, or 2 based on the calculations made by using the magic number. If the variable is 1 then it jumps to one part of code, if the variable is not then it jumps to another. We will look at these two branches next.
So both branches do relatively the same thing with one minor change. To start off they both check to see if the character at the index has a hex value less than or equal to 0x20, which is just a space character. If it does, then we do not mess with that character, increase our counter and do the next iteration. If it does not then it checks to see if the value is equal to 0x7f, the DELETE character in ASCII. If it is then we do not do anything again and continue to the next iteration. If not then we continue with decryption. This is just setting the bounds of writable ASCII characters, 0x20–0x7f.
So if our magic variable that we calculated earlier is equal to 1 then we perform the left branch, which at the very end of the ASCII bounds check subtracts our letter by one for decryption. This means that B becomes A, D becomes C, and so on. If our magic variable does not equal 1 then we add our letter by one, A becomes B, C becomes D, and so on. That is all this algorithm does, which is not too frightening and makes it easy for us to decrypt the strings. Let us see what that strHost string (‘yy123-e4213-mfs’) becomes after running through this function.
First, I am going to assign each each character in the string the value of either 0, 1, or 2. This mimics the calculations done using that magic number.
y y 1 2 3 - e 4 2 1 3 - m f s 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
Now for every character whose value is not 1 I will add its ASCII value by 1, and with every character whose value is 1 I will subtract its ASCII value by 1.
So the malware most likely connects to the host zx232.f3322.net, which is probably the command and control center.
I made a quick python script to decrypt any other strings I may come across.
def decrypt_string(string, length): counter = 0 #local_4h local_18 = 0 local_1c = 0 new_string = '' while counter < length: if not local_18 == 1: letter = string[counter] if not letter == ' ': new_string += chr(ord(letter)+1) else: letter = string[counter] if not letter == ' ': new_string += chr(ord(letter)-1) counter += 1 local_18 += 1 if local_18 > 2: local_18 = 0
And here is list of the other encrypted strings and their decrypted counterparts:
The rest of the sample is fairly straightforward. It copies itself into the file /etc/.zl, /tmp/.lz, and /etc/init.d/.zl. It then makes a copy of itself in rc2.d, rc3.d, rc4.d, and rc5.d and symbolically links them to the .zl file in /etc/init.d. Making these clones ensures persistence in the system as files in these folders are called on startup. This can be a host-based identifier of the malware if you notice these files in the system. It then pings the server at zx232.f3322.net on port 54188 to wait for some sort of response. This can be a network-based identifier of the malware. I have not seen the malware actually get a response yet, which can be caused by several reasons. For example, the server could no longer be in use, or the owner wasn’t sending out commands at the time, etc. I assume though that the program receives a command from the server and executes it.
The interesting part of this malware was mostly the decryption function, so this article was a little lesson on why home brewed cryptography is a bad idea. In this case it allowed us to decrypt the strings to find out more about what the malware is doing. Hopefully this was an interesting article for everyone as it was really fun analyzing this sample. As always, I am still a novice at malware analysis so if there is something I can improve on please tell me. You can contact me at my Twitter and Linkedin.
Thanks for reading, and happy reversing!tags: Reverse Engineering - Radare2 - Malware Analysis - Malware - Linux