ELI5 – Format Preserving Encryption

Most block ciphers work on the bytes and bits of data. It doesn’t matter to them if the data is a video, a phone number, a credit card number or a meme of Patrick Star. And that’s good. It means that a generic block cipher can handle almost all the traffic encryption over TLS while we’re busy browsing Reddit with YouTube playing music in another tab. And since the underlying network handles binary data just as well and efficiently, no one complains.

And it is generally a bad practice to write domain-specific encryption algorithms. That is the reason you’ve never heard of AES Image Encryptors. But sometimes, for very specific use cases, it becomes necessary to have something like that. Something like an AES for images, so to speak.

The Problem

What if a legacy application needs to integrate encryption of some of its data but not change any of the existing data structures? AES works on 128-bit blocks of data, DES on 64, so it won’t work for phone numbers or credit card numbers. At least, not without changing the underlying data structures required to store the encrypted data. And suppose we cannot change the underlying architecture for various reasons, one of them simply because we do not control some of the machines passing our data. Yes, that’s where we need format preserving encryption (FPE).

Format Preserving Encryption

While researching about this topic, I came across this beautiful construct by cryptographers John Black and Phillip Rogaway. The construct is simple and the best part is that it uses a block cipher as a pseudo-random function (in case of block ciphers with larger block sizes, we truncate their outputs to the desired bit size by taking only the N least significant bits), thus inheriting all the goodies of the underlying block cipher. Let’s look at a brief working of this method.

Let the message space be M. In case of phone numbers, that’s from 0 to 9,999,999,999 (that’s for India, and while the actual message space is much smaller than that, no harm in assuming for the entire range). The number of bits required to store this information is ln(10^10) = ~24. So we can fit the ciphertext in 24 bits assuming no padding or integrity checks. Now imagine two sets, X and Y. Let X be a superset of Y. In this construct, X represents the set of all possible ciphertexts that we can get on encrypting each Mi with our block cipher. Y represents the set of allowed ciphertext, that is, ciphertexts that are equal to or less than the max value of our message space M (which is 9999999999 in our example).

Now, when you encrypt a phone number with a block cipher, there’s a good probability that the value would be less than or equal to our phone number block size (10 digits or 24 bits, assuming we’re truncating AES output to 24 bits as well). If that’s the case, that’s our answer. If not, encrypt this ciphertext and check again. Continue this until you reach a number that can fit in 10 integer digits.

Now while some of you might think (I certainly did) this would result in a long loop, it would not (with high probability). This solution not only works but works efficiently (on an average, the answer will be found in 2 iterations with 50% plus probability of finding it in each iteration). That’s pretty cool if you’d ask me!

In an ideal world, you’d want to rewrite the logic and underlying data structures such that native AES is possible. In this one, format-preserving encryption would work just fine. Thank you for reading.

ELI5 – Key Derivation Function

We’ve heard that AES and other block ciphers require specific key sizes; 128, 256 and 512 bits. But I don’t ever remember having to calculate my password length based on the underlying key size. Never have I read on a website “passwords need to be of 16 ASCII characters, 1 byte each, to make a total of 128 bits of key material”. So what lies between me entering an arbitrarily sized password and the encryption algorithm receiving a 128/256 bit nicely sized key. Let’s find that out in this ELI5.

Key Derivation Function

A Key Derivation Function (wait for it…) derives cryptographic key(s) from a password. Generally speaking, the passwords we humans come up with are something like “MyAwesomeDog007” which, while long and easy to remember, just don’t have enough entropy for cryptographic applications. On the other hand, a key derived from a simple password “ml6xU*dwGS5rvE!dcIg6509w$$” (that’s not a real key, a real key would in most cases be binary) is complex and entropy rich. This is the first purpose a KDF serves; to increase the entropy of a password and making it suitable for use in other algorithms such as AES.

The second purpose that KDFs serve is that they make brute forcing infeasible. Due to the high computational costs of running a good KDF, brute forcing is typically not achievable for any half decent password. Of course, it won’t protect a user from a dictionary attack if she selects a password such as “password123”.

Working

A KDF takes an arbitrarily sized input that has low entropy (user-supplied password, for example), runs some hash-based algorithms on it, and output a random looking fixed sized cryptographic key (which becomes input key to encryption and MACing algorithms later). A KDF can be thought of as a pseudo-random function (PRF) which maps an input password to an output key. As a PRF, the input and output mappings should look completely random to an attacker and in no circumstance should he be able to get the original password from a cryptographic key (that is, the function should be one way). The high iteration count makes computing KDF an expensive affair. This is acceptable for a legitimate user but will prevent brute forcing of the password.

Typically, key derivation functions employ keyed hash algorithms or HMAC. Cryptographic salt is used to prevent rainbow table attacks (precomputed hash lookups). The number of iterations (in the order of tens to hundreds of thousands) of the hash function is selected to slow down bruteforce attacks.

Implementations

A simple key derivation function is Password Based Key Derivation Function 2, PBKDF2. It takes as input a pseudo-random function (such a SHA-256), user supplied key, salt (64+ bits), number of iterations, length of output key, and outputs a key of specified length.

Although PBKDF2 is still used and recommended, modern alternatives such as Scrypt and Argon2 offer much better resistance to bruteforce.

ELI5 – Message Authentication Code

You need some urgent cash to buy today’s lunch. You throw a paper chit at your colleague, “Hey, I need you to transfer 100 bucks in my account number 10022, urgent”. Eve, a bad actor in your office, intercepts the chit, changes the 10022 to 10033, which is her account number, and forwards it to your friend. Your friend, intending to help you, transfers the amount and you both get duped!

The Problem

The above is not a overly rare event, far from it. Such attacks happen all the time on the internet, and the reason is the lack of (cryptographic) authenticity built into core internet protocols. We learned in Authenticated Encryption that confidentiality alone doesn’t mean anything if the attacker can perform active attacks on your communication channel (just like Eve could). We need something better. We need MACs.

Message Authentication Code

As the name gives away, a MAC is an authentication code associated with a message which verifies the integrity of the message and, assuming that the key is only known to you and the message’s sender, its authenticity. Just like with encryption, you give a MAC algorithm a message and a key, and it gives you a tag. This tag is unique to your message and the key pair, and an attacker shouldn’t be able to forge a valid tag for any random message of his choice even if he’s given an infinite number of ciphertext-tag pairs to analyze.


From Wikipedia’s MAC page

In concept, a MAC is similar to a hash function, such that given an arbitrary sized input, you get a fixed-sized output (digest) and this can be reproduced (‘verified’) on other machines as long as one can find the same hash function’s implementation. This is how your download manager ensures that the file it has downloaded from the internet is not broken, by calculating the hash digest and comparing it with the one the website claims. A MAC differs from a traditional hash function in that along with a message input, it also takes a key and as such, knowledge of the key as well as the underlying MAC algorithm is needed to verify (or create a new) a tag.

In fact, one of the most popular MAC algorithms is based on hash functions. The algorithm is called HMAC for Hash-based Message Authentication Code. It works by hashing key material with the message while taking preventive measures for popular attacks on hash functions such as length extension attacks. Any reasonable hash function can be used for the purpose of MAC’ing, including SHA-1 and SHA-256, (MD5 isn’t recommended).

Encryption of the underlying data is not a prerequisite for using MAC, and they can be used irrespective of whether the data being MAC’d needs confidentiality or not. Use MACs whenever data integrity is needed. One caveat to look out for; MAC algorithms by themselves do not prevent replay attacks.

Aside on Replay attacks: A replay attack may happen when, say, you owe Eve some money. You send a note with Eve for your bank saying, “Please give Eve Rs.100 from my account, Signed: Bob”. Now there’s nothing preventing Eve from being greedy and using that same note again some days later. This is prevented in the real world by making cheques unique and one-time use only. Similarly, ciphertexts must embed information (such as packet number, timestamp, session counter etc) that will expire once received and not let Eve re-send it at a later time.

Thank you for reading.

ELI5 – Authenticated Encryption

The core goals of cryptography and any application of cryptography are confidentiality, integrity, and authenticity. Let’s begin with a short one liner on each:

  • Confidentiality: No one should be able to read the contents of the message except the intended recipient.
  • Integrity: No one should be able to tamper with the message without going unnoticed.
  • Authenticity: The recipient should be able to confirm that the message indeed came from the sender.

There are other goals that we do not need to touch upon in this article, such as non-repudiation and plausible deniability.

The Problem

Now the problem with using just an encryption algorithm like AES with a non-authenticating mode like CBC is that anyone can change the ciphertext during transmission. And while you might think, “but the modified ciphertext, with high probability, will decrypt to something gibberish”, this isn’t the right argument because the recipient will have no way of knowing for sure, which is a problem, a huge one.

Secondly, there’s also no way of knowing if the message was sent by a person you’re expecting it from. It might have come from any middleman intercepting your network and you wouldn’t be able to tell a difference. And for this reason, encryption without authentication and integrity completely destroys the purpose of encryption. An example of this in the real world is when you see an error such as the following:


https://support.mozilla.org/en-US/kb/what-does-your-connection-is-not-secure-mean

While this can mean that the encryption mode used by the website is weak, more often than not, this means that the browser was able to establish a secure connection but the identity of the website is unknown. This defeats the purpose of encryption because even if the connection is secure, the fact that you don’t know if you’re receiving a message from your intended recipient or if the message hasn’t tampered with defeats the purpose of using cryptography.

Enter Authenticated Encryption

Authenticated encryption solves this problem by introducing authentication and integrity as freebies that you get when you use an authenticated encryption mode along with an encryption cipher such as AES. Examples of authenticated encryption modes include GCM and CCM. In fact, if you check the connection info of the site you’re reading this on (click the green icon and then select more info or something similar on chrome and firefox) and check the technical details part, you’ll see something like this, depending on your browser.


Yes, I’m the most active visitor of my blog

Here, AES_128_GCM is used for symmetric encryption of the content you exchange with the server with AES providing confidentiality and GCM providing authentication and integrity. SHA256 is used to authenticate the initial handshake and as a pseudo-random function (PRF).

In a nutshell, these authenticated encryptions usually take a message, encrypt it, then MAC the ciphertext (and IV) and then append the MAC to the ciphertext. This is called Encrypt-then-MAC. Now if the ciphertext is changed, the MAC won’t match and the receiver can easily discard such messages without having to touch the contents of ciphertext. There are other variations to this method, namely MAC-then-Encrypt and MAC-and-Encrypt, with benefits of going with each although most experts recommend doing Encrypt-then-MAC.



From wikipedia page on authenticated encryption. This is Encrypt-then-MAC

As you can imagine, this can be easily done manually (and until some years ago, it was mostly done by developers). But since it is easier (and much more secure) to standardize such modes and leave the secure implementation part to the experts, these ‘readymade’ modes have picked up wide adoption and as you saw, you’re currently using GCM to ensure confidentiality, integrity, and authenticity of this very line. Thank you for reading!

Advice From An Old Programmer – Zed Shaw

The first article on my site is about me starting with Python. That was four years ago and around that time I had read this book called ‘Learn Python The Hard Way’. I really, really liked it. It was amongst one of those earlier pieces of memories that I’d probably never forget for my entire life, similar to the kind of impact reading The Hacker Manifesto and Sir Eric Raymond’s ‘How To Become A Hacker’ had on me.

I decided to give the Python 3 edition of Learn Python The Hard Way a read. The last section of the book is titled ‘Advice from an old programmer’ and in that, Zed Shaw shares with us some of his zoomed-out thoughts on programming and the career one makes out of it. Although it is very subjective and very blunt, just like the rest of the book (and I really like the rawness in his writing), for me personally it refreshed the old memories associated with the book.

I had read this exact chapter in the previous edition, but this time it made so much more sense. And not just this chapter, but in the entire book, the subtle pieces of well targeted humor and strong opinions held by the author were something of a delight to read even if you didn’t believe in the exact same thing.

I’m copy pasting the section of that book that I think I’ll come back to read re-read again and again. I think many of you will appreciate it as well.

Advice From An Old Programmer

You’ve finished this book and have decided to continue with programming. Maybe it will be a career for you, or maybe it will be a hobby. You’ll need some advice to make sure you continue on the right path and get the most enjoyment out of your newly chosen activity.

I’ve been programming for a very long time. So long that it’s incredibly boring to me. At the time that I wrote this book, I knew about 20 programming languages and could learn new ones in about a day to a week depending on how weird they were. Eventually, though, this just became boring and couldn’t hold my interest anymore. This doesn’t mean I think programming is boring, or that you will think it’s boring, only that I find it uninteresting at this point in my journey.

What I discovered after this journey of learning is that it’s not the languages that matter but what you do
with them. Actually, I always knew that, but I’d get distracted by the languages and forget it periodically.
Now I never forget it, and neither should you.

Which programming language you learn and use doesn’t matter. Do not get sucked into the religion
surrounding programming languages as that will only blind you to their true purpose of being your tool
for doing interesting things.

Programming as an intellectual activity is the only art form that allows you to create interactive art. You
can create projects that other people can play with, and you can talk to them indirectly. No other art form
is quite this interactive. Movies flow to the audience in one direction. Paintings do not move. Code goes
both ways.

Programming as a profession is only moderately interesting. It can be a good job, but you could make
about the same money and be happier running a fast food joint. You’re much better off using code as
your secret weapon in another profession.

People who can code in the world of technology companies are a dime a dozen and get no respect.
People who can code in biology, medicine, government, sociology, physics, history, and mathematics
are respected and can do amazing things to advance those disciplines.

Of course, all of this advice is pointless. If you liked learning to write software with this book, you should try
to use it to improve your life any way you can. Go out and explore this weird, wonderful, new intellectual
pursuit that barely anyone in the last 50 years has been able to explore. Might as well enjoy it while you
can.

Finally, I’ll say that learning to create software changes you and makes you different. Not better or
worse, just different. You may find that people treat you harshly because you can create software, maybe
using words like “nerd.” Maybe you’ll find that because you can dissect their logic they hate arguing
with you. You may even find that simply knowing how a computer works makes you annoying and weird
to them.

To this I have just one piece of advice: they can go to hell. The world needs more weird people who know
how things work and who love to figure it all out. When they treat you like this, just remember that this is
your journey, not theirs. Being different is not a crime, and people who tell you it is are just jealous that
you’ve picked up a skill they never in their wildest dreams could acquire.
You can code.

They cannot. That is pretty damn cool.

Beautiful, isn’t it? Thank you for reading!

Blog Anniversary – Four Amazing Years

So it is the time of the year when my domain registrar pings notifying me of domain expiration and reminds me of all the adventures I’ve had with this domain, this blog. And after contemplating for some time, the inner writer whispers, ‘iss awsar par ek post toh banta hai’.

Before this one, I had a couple of other blogs. The goal with those blogs was making money (and learning and knowledge sharing, but honestly, I wanted to taste money). There’s a difference between writing for yourself and writing for a wider, more general audience while thinking about SEO and praying to the gods at Google to increase your PR. After a while, finding no success and realizing that I wasn’t enjoying it that much (you know you’re obsessed with traffic and pageviews when you open Google Analytics every 30 minutes!), I gave up on that and started this personal blog, convincing myself that traffic oriented blogging isn’t my piece of cake. Four years ago, in that article, I wrote this, not knowing how it will turn out.

So, you may ask, what is this thing? The thing you are reading this on! Isn’t it a blog?

Yes it surely it, but I feel it is more than a blog. It is a diary. I don’t really care if no one reads this blog, and as a result, if you look at the source code, there is no analytics code installed. It simply means I never know how many people visit this blog, if at all they do. All I know is I enjoy writing here far more that what I did writing on a blog where people actually came to read stuff.

I’m happy that it was a successful experiment. Writing without the lure of traffic and affiliate commission is much more liberating, and you can be extremely honest about whatever you like. Most importantly, this place has become sort of a hangout for me, a diary and a place to reflect. I sometimes simply read the old articles to find spelling and grammatical errors, haha. There are so many of them, but I correct none. Correcting them would be slowly erasing the past me, so let that stay as it is. It gives me a nice perspective on how my interests, my hobbies and simpler things like the way I construct sentences, are changing. And change is good.

Thank you for reading.

Time & Hash Based One Time Passwords

Ever wondered how two factor authentication apps works? I certainly did. One could just guess how SMS based tokens work, that’s simple (although they shouldn’t be used as per guidelines from NIST). But what about TOTP or Time based One Time Password, the ones in which you scan a QR code and the OTP generator app (like freeOTP and andOTP) gives you a new six digit token every 30 seconds or so?

I was, for quite some time, under the (very misinformed) impression that web servers which implement this method of 2FA expose an API and, by means of the QR code, give you the endpoint with some token and then the OTP generator app polls the server and gets a new ephemeral password which the user enters in the application. Straight forward, but plain wrong.

The belief was challenged when I noticed that the OTP generator app works irrespective of network connection, even if both devices are offline (that is, when working on localhost server). HOW? I dug further and I learned some very interesting things, some of which I wanted to write here.

The Name, Dude

Time based One Time Password, the name itself gives enough clue to guess that it uses time, and as such is independent of inter-communication as long as the two systems are in time-sync. But an adversary can be assumed to be in time-sync as well, right? Yes, that’s where we bring in the secret (a randomly generated token) which is embedded in the QR code that you scan with your smartphone app. So we have the time which is in sync and we’ve established a way of transferring the secret from the server to the client. Turns out, that’s all the data we need to keep generating secrets independently on the client and server side, completely offline once the initial secret sharing happens.

Basic OTP Algorithm

  function generate_otp(secret, counter) {
    h = hmac(key=secret, message=counter, algorithm=sha1)
    offset = get_last_four_bits(h)
    pre_opt = get_32_bits_starting_from_offset(h)
    otp = get_desired_number_of_chars(pre_otp, N)
  }

  function get_totp(secret) {
    counter = epoch / 30
    return generate_otp(secret, counter)
  }

  global counter; // get from database
  function get_hotp(secret) {
    return generate_otp(secret, counter)
  }

The basic OTP algorithm (both time and hash based) accept a secret and a counter value. Combining current time and the secret, a new 6 (or N) digit token is generated every 30 (or Ti) seconds. They differ in what the counter value supplied to the algorithm.

  • TOTP: take the number of times the interval Ti can be fitted in the total number of seconds since epoch. Which is just a weird way of saying that the interval is the quotient when you divide the seconds_since_epoch number by the interval duration (Ti).
  • HOTP: take the current counter stored persistently and use that. After use, increment the counter in the database.

After establishing the counter value, the rest of the steps remain the same in both the cases.

  • Compute HMAC value of message Ti and key secret, get the hex digest
  • The last 4 bits (last digit in hex) is stored as offset
  • Starting from offsetth bit, take 32 bits (8 hex digits) and discard the first bit (xor with 7ffffffff. This works because f = 1111 and 7 = 0111, so &’ing with 7 (0111) is equal to switching off the first bit)
  • Convert the 32 bits hex to int, then take the least significant 6 bits (or N depending on requirement), and that is your OTP
  • ?? Profit!

Security Consideration

The entire security of the OTP lies in the secrecy of the initial secret. If that’s compromised, an attacker can easily generate as many OTPs as she wishes. Also, given the keyspace of the OTP, and also that many servers are designed to accept counter+1 and counter-1 OTPs, securing the system against bruteforce is a most.

One important aspect of these 2FA mechanisms is that losing your 2FA device means losing your account. This is especially the case with services like ProtonMail where the password is used to decrypt client data.

Naive Python Implementation

Given how simple this algorithm was, I tried to implement it. The core algorithm was literally less than five lines of python code. Here’s a naive implementation of the same, and while it seems to work, I’d not use it anywhere.

Thank you for reading! PS Python noob, please don’t judge! 😛

GDPR Humor

Curating a list humorous content related to GDPR.

xkcd #1998

GDPR Hall Of Shame: https://gdprhallofshame.com/

In midst of outburst of privacy policy updates from random companies reminding you of how much they care about your privacy, mozilla dropped this gem.

Ghostery sent 500 users email updating them about changes in privacy policy and forgot to use the BCC feature.

Found this on reddit r/Europe

This on r/ProgrammerHumor

Thank you for reading!

Reusing An Old Laptop’s LCD Panel

A month ago, my manager from LaughGuru gave me his old non-functional laptop. It was a six year old Dell Inspiron 15R 5520 notebook computer with 2GB ram, third-gen Intel i5 and a 500GB hard disk and weighted almost as much as Misty and my new Thinkpad combined. I couldn’t get the motherboard to boot up so I decided to take it apart (and given the condition of the laptop it made little sense to repair it).

I’ve found some really useful things inside the laptop. The two gig ram chip and the 500 gigs hard disk now sits on another water damaged laptop that my friend gave me a week ago and powers it flawlessly. The CD drive will come in handy someday as an external disk drive. I planned to use the display as an external monitor for my laptop, and that turned out to be a DIY project in itself.

Enter LVDS Connectors

So I isolated the LCD and looked very carefully for any hints on what sort of connector was that dangling from behind the panel. It had the word LVDS on it. Some Internet research later, things became clear. As alien as it sounds, LVDS or Low-voltage differential signaling is a standard that is used for high-speed transfers using very low power. For me, it simply meant that there’s no straightforward way of plugging the HDMI or VGA cable from my laptop into the bare LCD panel and start using it.

Unfortunately, there’s also no easy way of using the laptop’s motherboard logic to make the LCD panel work. Searching for a solution made it clear that LCD controller kit is what needs to be used. It is important that the exact spec of the LCD panel is known, as the kits are only compatible with a small range of panels. It might be difficult if the LCD was never working in your possession, as in my case, but this nice website called Panelook makes it easy to get the detailed spec of the panel from just the serial number. Things to look at are the resolution and the backlight type. The resolution needs to be an exact match and the backlight type is needed to judge if you’ll need an inverter with your controller kit. Mine was a WLED panel so no external power source or inverter needed. The next step is searching the serial number on ebay and other local hobby sites. I found a nice kit on Banggood and decided to order it.

Putting It All Together

Interfacing was simple, and there are nice videos on the topic on youtube. There’s the LVDS connector that goes into your LCD panel. Then there’s the controls board that needs to be plugged into the main board. The controls board also has an IR receiver for remote control and can be used to control brightness, contrast, sharpness etc of the panel.

The board itself supported inputs through AV connector, HDMI, and VGA. It operates on 12 volts and 4 amps and luckily I had a 5 amps supply with me, so no extra expenditure there. I had to borrow VGA cable from a friend though as my laptop only has VGA and no HDMI.

I used an old Tupperware tiffin box as the LCD’s stand to keep the delicate LVDS cables safe and away from physical contact with anything. To avoid physical damage to the panel, I’ve used the stock display cover of the laptop (the top half) as it was. The added benefit of using the stock plastic cover was that I could drill holes and fix the controller board on the back of the panel like an all in one PC (I could’ve literally made it an all-in-one PC by docking my raspberry pi back there as well, haha). Overall, I was very happy with the result, and as I write this, I have my editor on my primary screen and the browser on the extended display. Dual monitors at home achievement unlocked!




Hope you found this article useful. Thank you for reading!

Banggood’s India Direct Mail Shipping

Here’s some great news for all you hobby electronics enthusiasts who drool at the sight of cheap stuff on Banggood and AliExpress, but upon rethinking about the time it takes them to ship something, give up on the prospect of buying it. I’m not a hobby electronics person, but after shopping some three-four times in the past year through sites like AliExpress and Banggood, I was convinced that the wait is definitely not worth the discount that you get, because the shipping time is usually around 40-50 days. Yes, nearly two months it took for my speaker amplifier to reach me. And then some others that never reached.

But it seems like those are the days of the past because two weeks ago when I reluctantly surfed Banggood for a LCD controller board that I couldn’t find on Ebay or other Indian sites, I saw a new method of shipping called ‘India Direct Mail’ (not really new, it has been around since the last quarter of 2017), and it promised to ship in 8-16 days at almost no additional cost.

I really found it hard to believe but decided to take risk and order as I didn’t have much to lose (except for some 1200 rupees). It turned out to be true. Today, I received my LCD controller board, just 12 days after ordering. I feel this is reasonable, especially given that you won’t get it for at least double that here in India, if at all you do get it. This is great stuff and I’ll definitely be using this more in the future.