Josh alerted me to the possibility of compromise of his system sometime on the morning of November 9th. By that point, he and a coworker of his had already taken steps to limit the traffic to and from the server. Because several parts of the system had been touched by this point, no substantial forensic analysis could be done to determine the nature of the attack. The systemd journal hadn't caught anything out of the ordinary aside from the remote root logins that were responsible for installing the malware. As expected, /root/.bash_history was cleared. wtmp showed root logins from foreign IP addresses as far back as October 20th, so in all likelihood, the attacks started at or around the time a new series of malware had begun to resurface in spite of the FBI's efforts to dismantle the botnet.
This is in spite of the arrests of several individuals believed to be responsible for writing and distributing the malware in the first place.
The net effect of the attack was a binary that included an embedded copy of nginx that had been dropped into /tmp and was listening on an unprivileged (>1024) TCP port. The binary itself contained configuration information that suggested infected systems were being used as proxies for the command and control network of a pretty ugly botnet. Hence the need to shut it down from outside routes to verify that it had been stopped completely.
Because of the lack of useful forensics in the efforts to stop the malware and the lack of auditing measures to determine precisely what had been done to the system, most of what follows is pure conjecture (albeit educated guesses). There are two scenarios that would have resulted in the infection of the goon host: Password bruteforce via SSH (although I believe remote root logins were disabled, which would have stopped that) or--most likely--a WordPress exploit that allowed the execution of arbitrary code, thus obtaining root on the system.
After some research, I'm increasingly more convinced the latter scenario is the one that was most likely to have unfolded. There are known exploits affecting WordPress versions up to 3.2.3 that allow remote attackers to upload and execute arbitrary files, and I suspect the domain (giverofbeatings.com) running WordPress version 2.9.2 was most likely responsible. I can't prove it because there's no obvious indications, but attackers who gain root access to a system via "webshells" may cover their tracks by removing the shell after gaining access.
Here's the likely sequence of events:
- The attacker uploaded a webshell via an exploit in WordPress that allowed arbitrary code to be placed in web-accessible locations.
- This webshell then allowed them to upload and run any additional code they wanted with the privileges of the WordPress user.
- The attacker was then able to gain root via a local exploit either in the kernel or a vulnerable setuid-flagged binary.
- From this point, the attacker was then able to use the system as a C&C proxy in their botnet.
I'm most inclined to believe that a kernel exploit was the most likely source of root access as the machine had not been updated for at least 6 months--possibly up to a year.
So, here's the kicker: The attackers most likely had access to the system for up to 3 weeks without being detected. The attack was most likely automated and was unlikely to have touched other parts of the system. However, because of the possibility that they may have accessed the forum database--no matter how unlikely--we encourage you to change your password. If you may have used the same password elsewhere, we advise that you review affected accounts and update your passwords where necessary. Such attackers will therefore have had access information containing the email addresses on record for your forum account.
Note: This doesn't mean they can read your email; it simply means they could have associated your username, email address, and--if they crack it--your forum password. What that means and what effect this has on your privacy and personal security is left as an exercise for the reader.
With the exception of old, abandoned Minecraft worlds, there was no other data stored on the system outside the forums. Admittedly, we don't even store access logs and haven't for a few years, so we have no records of when you may have visited the forums.
Recent versions of phpBB make use of bcrypt, but the current incantation we're running does not. Instead, it makes use of HMAC calling MD5 several times (9 at present); while this is somewhat stronger than MD5 alone, it still yields password hashes that can be brute-forced in short order. It's generally accepted that passwords between 6-8 characters using a single MD5 pass can be brute-forced in a month or two on relatively modest hardware. Longer passwords yield longer periods of time between successful brute force attempts. Such passwords, if they're over 16 characters, are still likely somewhat cost prohibitive to crack for anyone who is not a state actor. Anyone who is not a state actor, gained access to the forum data, and might have sufficient hardware to attack lengthy passwords is much more likely to be spending their time mining bitcoins. (Sorry you're not that important.)
We'll be making plans to migrate to newer versions of phpBB in the coming weeks, and you'll be encouraged to change your passwords at that time.
So that brings us to mitigation strategies.
At present, I have been experimenting with Linux containers to provide some modicum of isolation so that we may limit the impact of such problems in the future. I'm not overly pleased with the current state of Linux containers, because they come in two flavors (privileged and unprivileged) and one doesn't work quite as well as the other. Privileged containers run as root, and while they offer process isolation, they can be circumvented; when privileged containers are escaped, the attacker then has root access on the host. Unprivileged containers offer an order of magnitude more protection: Should an attacker successfully escape the container, they must still find a way to gain root access on the host. Obviously, neither is especially ideal, but known container escape exploits are relatively uncommon and are patched quickly. (Unfortunately, a few such patches require a kernel update.) Moreover, since containers can be composed of a much narrower subset of software, the container's attack surface can be greatly reduced over that of a general purpose host on which everything is installed.
I haven't yet decided whether to stick with systemd-nspawn containers or switch to LXC. The current host (my VPS) is running systemd-nspawn containers. Unfortunately, I have yet to determine how to effectively get an instance of Arch Linux running in an unprivileged container (some systemd services fail, rendering the container useless). Same for Ubuntu versions that are not running upstart. Basically, anything with systemd seems to fail in an unprivileged container, and there's no obvious workaround at this time since the technology is relatively new and unexplored. C'est la vie.
Once we containerize the services, the WordPress installations will likely be running in their own isolated instances (I consider this a "time out" for being naughty), and the forums will either exist in a separate instance with other services like TeamSpeak or in a solitary silo.
I've always preferred using key-based authentication for SSH to avoid password-based attacks. We'll be migrating to that.
I'll be taking additional precautions with the nginx configuration to tighten down security as best as I can, but the structure of WordPress doesn't lend itself very well to this sort of thing out of the box. It's almost hilarious. You basically have two choices: Enable easy one-click updates which necessitates setting ownership of the WordPress sources to the same user that can upload anything, or disable easy updates by requiring manual intervention and tightening down who can upload what (and where). It might be possible to segregate this into two separate users, thus keeping the easy update feature, but it's not something I've actually tried.
I'll also ask Josh if I should take a more active role in helping monitor his system once he gets a chance to reinstall it. I didn't do that this time, nor did I monitor system software updates that were in need of installation, suggest upgrade strategies, or keep a close eye on things at large. I should have been more aggressive in pursuing updates, certainly, but I admit that I've instead been distracted with my own systems and services. It's not much of an excuse, but I'll be writing some automation software to alleviate this in the future since I have a few WordPress sites of my own that will need similar monitoring and protection.
Here's where we stand right now: The goon services are all mostly back up and running. TeamSpeak is available again and has been for quite some time, the goon main site is up, but some of the larger media (videos) are not available. I haven't yet taken the time to upload them in their entirety, and I may wait until we decide what we're going to do. Fortunately, migrating the goon services will be relatively easy. We'll just need a couple of hours to make the changes and wait for DNS to propagate.
Thanks everyone, and sorry about the disaster.