Linux Malware Analysis

Linux Malware Analysis—Why Homebrew Encryption is Bad

02 February 2018

By Jacob Pimental


Linux is one of my favorite operating systems, but you seldom see malware for it, so I was pretty interested when Linux Malware was caught by my honeypot. This article will be my analysis of the sample, particularly the decryption function that was used throughout it. It’s a good example of why using your own encryption algorithm isn’t very secure.

Like with any analysis, I first toss the file into VirusTotal to see what is going on:

ELF Binary on VirusTotal

We can see here that only 34 out of 59 vendors identified this malware, not very surprising given that it’s a Linux binary. The binary is not packed and there isn’t anything extremely interesting in the VirusTotal analysis. The next thing to do is run it through rabin2 to get some basic information about it.

arch     x86
binsz    646674
bintype  elf
bits     32
canary   false
class    ELF32
crypto   false
endian   little
havecode true
lang     c
linenum  true
lsyms    true
machine  Intel 80386
maxopsz  16
minopsz  1
nx       true
os       linux
pcalign  0
pic      false
relocs   true
rpath    NONE
static   true
stripped false
subsys   linux
va       true

Again, nothing too interesting. This is a Linux binary that was coded in the C programming language. It starts to get interesting once you analyze the exports using the -E flag in rabin2.

[Exports]
956 0x00020070 0x08068070 GLOBAL   FUNC 1365 __vsyslog_chk
957 0x00073c00 0x080bbc00 GLOBAL OBJECT   36 _nl_C_LC_CTYPE
959 0x00021790 0x08069790 GLOBAL   FUNC    8 __stack_chk_fail_local
965 0x0008b9e8 0x080d49e8 GLOBAL OBJECT    4 __morecore
966 0x0001fc80 0x08067c80 GLOBAL   FUNC   41 __getdtablesize
967 0x00014f80 0x0805cf80 GLOBAL   FUNC   40 _IO_remove_marker
969 0x00009090 0x08051090 GLOBAL   FUNC  291 __libc_sigaction
970 0x00051ef0 0x08099ef0 GLOBAL   FUNC   69 __isnanl
971 0x00042ae0 0x0808aae0 GLOBAL   FUNC  170 __libc_pread
974 0x0001db60 0x08065b60 GLOBAL   FUNC   34 strcpy
975 0x0003c460 0x08084460 GLOBAL   FUNC  200 _IO_wdefault_xsgetn
976 0x00012080 0x0805a080 GLOBAL   FUNC    9 __fcloseall
977 0x00020630 0x08068630 GLOBAL   FUNC   44 __syslog
978 0x00000ba3 0x08048ba3 GLOBAL   FUNC   74 V8ULRand
979 0x0000e600 0x08056600 GLOBAL   FUNC  234 __setstate_r
980 0x0006ce50 0x080b4e50 GLOBAL   FUNC  213 _dl_vsym

You can see that all of the object and function names are defined for us. This was probably used while the author was debugging their malware. Now we won’t have to deal with radare2 creating names for functions since it already knows that they all are. Now we can pop the binary into radare2 and seek to the main function.

One of the first things I noticed was this snippet of code:

DecryptData function being called

Not only does it show in the comments that the string is called strHost, it also shows that the name of the function that strHost is being passed into is called DecryptData. We can guess now that strHost probably holds some sort of hostname or address that the malware might connect to, and that DecryptData will decrypt this string so it will be usable, as opposed to what it is now, ‘yy123-e4213-mfs’. Next we can jump into DecryptData and see what that looks like.

Inside DecryptData function

To start off, we can see that the value 0 is moved into a counter variable (keep in mind, I renamed all of the variables in this function using the afvn command, this makes it a lot easier for me to analyze). It then starts the loop by checking to see if the counter variable is less than the string’s length that was provided as an argument. If it is then we go to the main logic of our loop.

Main logic of loop

This may seem pretty daunting for newer analysts but I’ll walk through what is going on. The program passes the current index of the string we are on to another counter variable (do not ask me why, because this could have been done without doing that). It then declares a new variable with the hex value of 0x55555556. This value is what is called a “magic number” and is used in improving the performance in division. This specific magic number comes out to 1431655766. If we multiply a number by this and then do a logical shift right by 32 bits, then it is the same result as dividing by 3! I will do my best to try to explain this, as it is the first I have heard about it but it is actually very interesting.

The example we will use for this is 6 divided by 3 which we all know is 2. Now in this case we would do 6 * 1431655766 which comes out to 8589934596, or in binary:

1000000000000000000000000000000100

Now if we shift all of the bits to the right 32 spaces we get:

0000000000000000000000000000000010

Which comes out to 2! It is a very interesting trick and teaches you a lot about how computer architecture works. It is dividing without having to divide! For more info on this I suggest reading this article.

So what the application is doing in essence is running through the counter and setting a variable to either 0, 1, or 2 based on the calculations made by using the magic number. If the variable is 1 then it jumps to one part of code, if the variable is not then it jumps to another. We will look at these two branches next.

Branch analysis

So both branches do relatively the same thing with one minor change. To start off they both check to see if the character at the index has a hex value less than or equal to 0x20, which is just a space character. If it does, then we do not mess with that character, increase our counter and do the next iteration. If it does not then it checks to see if the value is equal to 0x7f, the DELETE character in ASCII. If it is then we do not do anything again and continue to the next iteration. If not then we continue with decryption. This is just setting the bounds of writable ASCII characters, 0x20–0x7f.

So if our magic variable that we calculated earlier is equal to 1 then we perform the left branch, which at the very end of the ASCII bounds check subtracts our letter by one for decryption. This means that B becomes A, D becomes C, and so on. If our magic variable does not equal 1 then we add our letter by one, A becomes B, C becomes D, and so on. That is all this algorithm does, which is not too frightening and makes it easy for us to decrypt the strings. Let us see what that strHost string (‘yy123-e4213-mfs’) becomes after running through this function.

First, I am going to assign each each character in the string the value of either 0, 1, or 2. This mimics the calculations done using that magic number.

y y 1 2 3 - e 4 2 1 3 - m f s
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2

Now for every character whose value is not 1 I will add its ASCII value by 1, and with every character whose value is 1 I will subtract its ASCII value by 1.

zx232.f3322.net

So the malware most likely connects to the host zx232.f3322.net, which is probably the command and control center.

I made a quick python script to decrypt any other strings I may come across.

def decrypt_string(string, length):
    counter = 0 #local_4h
    local_18 = 0
    local_1c = 0
    new_string = ''
    while counter < length:
        if not local_18 == 1:
            letter = string[counter]
            if not letter == ' ':
                new_string += chr(ord(letter)+1)
        else:
            letter = string[counter]
            if not letter == ' ':
                new_string += chr(ord(letter)-1)
        counter += 1
        local_18 += 1
        if local_18 > 2:
            local_18 = 0

And here is list of the other encrypted strings and their decrypted counterparts:

Obfuscated vs deobfuscated strings

The rest of the sample is fairly straightforward. It copies itself into the file /etc/.zl, /tmp/.lz, and /etc/init.d/.zl. It then makes a copy of itself in rc2.d, rc3.d, rc4.d, and rc5.d and symbolically links them to the .zl file in /etc/init.d. Making these clones ensures persistence in the system as files in these folders are called on startup. This can be a host-based identifier of the malware if you notice these files in the system. It then pings the server at zx232.f3322.net on port 54188 to wait for some sort of response. This can be a network-based identifier of the malware. I have not seen the malware actually get a response yet, which can be caused by several reasons. For example, the server could no longer be in use, or the owner wasn’t sending out commands at the time, etc. I assume though that the program receives a command from the server and executes it.

The interesting part of this malware was mostly the decryption function, so this article was a little lesson on why home brewed cryptography is a bad idea. In this case it allowed us to decrypt the strings to find out more about what the malware is doing. Hopefully this was an interesting article for everyone as it was really fun analyzing this sample. As always, I am still a novice at malware analysis so if there is something I can improve on please tell me. You can contact me at my Twitter and Linkedin.

Thanks for reading, and happy reversing!

Radare2, Malware Analysis, Malware, Linux

More Content Like This: