By now, you've probably heard that LinkedIn's passwords have been allegedly compromised. I first heard about this from a Norwegian website earlier today. Here is what we know now:
- LinkedIn has not confirmed the leak and currently doesn't understand how the hack could have happened, but there is a 271 MB file of alleged SHA-1 hashes floating around with LinkedIn's name on it.
- The hash digests are unsalted SHA-1 hashes.
Technical mumbo jumbo is to follow, if you're just worried about your password you should simply change your password. Don't use any website that offers to check if your password has been leaked by typing your password into their website. These may be completely legit, but they're probably trying to ride on your fear and steal your password. You should also change your password on any other site that you used that password on.
If you'd like to see the list of hashes for research purposes you can download it here.
This is particularly interesting to for a number of reasons. Right now I'm in Sofia, Bulgaria teaching a room full of developers secure coding best practices and just yesterday we had talked about proper handling of passwords and other sensitive data. We walked through the spectrum of poor password handling practices.
To understand the risk let's walk through a quick rundown of what hashing is and how LinkedIn is storing your passwords.
Hashing is a one-way mathematic function that can take any input and map it to a smaller output (digest). The nice thing about hashing is that if you use the same input you are guaranteed to get the same output or digest. If the input changes, even in the slightest bit, your digest will change drastically. Another nice thing about hashing is that the digest doesn't give you any information about the input, so it is not possible to reverse a hash.
We can use hashing algorithms to make it easier to safer to store passwords. Instead of storing the plaintext password in the database, which everybody agrees is bad, right? We can store the digest of the password in the database. Then when you type in your password I'll calculate the hash of what you typed in and compare it with what I have on file for you. If they match then you're in, if they don't I'll kick you out!
There are multiple ways, or algorithms, to "hash" text, some are better than others, only "cryptographic" hash functions should be used for anything security related, so I'll only talk about those here. Again, we have a spectrum of worst to best (hint, if you're thinking about using a hashing algorithm that isn't on this list... don't):
MD5 <-> SHA-1 <-> SHA-256224 <-> SHA-512/384
The first two are broken and should not be used. Note, LinkedIn used one of the broken ones.
Hopefully as you're reading this you're thinking "Joe! You said earlier that if give only the output of a hash you can't get back to the original text, so if LinkedIn was only storing the hash digest of my password I should be safe, right!?"
Ah true! However, that's assuming I attack the hashes by trying to break the hash algorithm directly. I've got a few quick computers, and a bit of time. I also know a bit about how people choose passwords, and how bad they are at it. So instead of trying to break the hashing algorithm, I'm going to simply get a list of every password I can get my hands on and calculate the output for each one. I'll keep track of the password on the left and the output I generate on the right. Next I'll simply lookup each digest from the list in the massive table I just generated and get the passwords from there.
This type of attack is called a Rainbow Table attack and it theoretically will work for any password, it can take a lot of work for really long and/or really random passwords because there are so many combinations. This is why I mentioned I'll just calculate the digests for the top passwords I already know about. This way I don't have to do an exhaustive search.
OK, so now maybe you're getting nervous, because your password of LvBieber isn't looking so great right now. Since you used upper and lower case letters there are 52 combinations in each of the 8 locations above. That means there are 528 or 53,459,728,531,456 combinations. "Wow," you think "that's a lot of combinations!" Slow down there, tiger. On my, mostly, regular computer I can calculate about 680 million possibilities per second. I can crack your password (and all the other 52 trillion passwords in about 22 hours (Before lunch, if I get a few friends to help).
What could LinkedIn have done to protect you from your own poor password choice? Well, they could have required a Password Policy, but everybody seems to hate those. They could have also added Salt. No, notthat salt, this Salt.
In software we call a chunk of random data that we add to passwords "salt." Since your password is so easily guessable it's likely it already exists in somebody's Rainbow table so the lookup would be really quick and easy. We want to make them work for it. So for each user I generate, say, 10 extra random characters to add to each password. This means I generate some random characters "7%bKeVm!fN" and add that to your password turning it into LvBieber7%bKeVm!fN If I do this for every user the hacker has to generate a rainbow table for each user independently. Since I have to store the salt in plaintext along side your password hash, since I'm the only one that knows it and I have to use it to generate your digest to validate your password. Well that's better than plaintext or just plain hashing, right? I bet we can do better though, right? Of course we can.
There are a couple of algorithms, PBKDF2 and bcrypt, that add an element of "work" to calculating a digest. The idea is that it's not such a big deal for a website to spend half a second calculating your password digest each time you login, but if an attacker can only calculate two passwords per second that makes any rainbow table attack infeasible. PBKDF2 does this by hashing the output of the previous digest thousands of times, bcrypt uses some fancy cryptography to do this a bit more elegantly, but the result is about the same. Both have been well implemented in many languages.
One thing to note. Don't ever try to implement your own cryptographic solutions. Never, ever, ever, ever, should you attempt this. Cryptographers are smarter than you and me and everybody I know put together. They spend all day thinking about these things, and even then it takes a team years to create new secure algorithms that is then peer reviewed and proven to be secure. It does not happen with a clever insight after four cups of coffee and an espresso.