Designing for critical privacy and data protection systems (I.5) – homomorphic secret sharing

Last time I have talked about some of the factors that influenced the evolution of privacy-preserving technologies.

I wanted to touch base with some of the technologies emerging from the impact of these factors and talk about some of the challenges that they come with.

After a discussion about e-differential privacy, I promised you a little discussion about homomorphic encryption.

There is a small detour that I find myself obligated to take. This is due to the latest circumstances of the SARS-CoV-2 outbreak: I want to split this discussion in two parts, and start with a little discussion about homomorphic secret sharing before I go into sharing my experience about adopting the homomorphic encryption.

What?! Why?

In the last article, I argued that one of the drivers for adopting new privacy mechanisms is: “The digitalization of the citizen facing services of nation-states. (stuff like e-voting, that I really advocate against)”

Well, sometime after the SARS-CoV-2 will be gone (a long time from today) I foresee a future where this kind of services will be more and more adopted. One of the areas where citizen facing services of nation-states will be digitalized is e-voting – e-voting within the parliament, for democratic elections, etc. I briefly mentioned last time that I am really against this. At least for now, given the status quo of the research in this area.

Let me explain a little bit the trouble with e-voting

Starting with a question: Why do you trust the people counting your vote?

[…annoying pause…]

A good answer here, could be:

  • Because all the parties having a stake in the elections have people counting. The counting is not done by a single ‘neutral’ authority.
  • Because, given the above statement, I can see my vote from the moment that I printed it, to the moment I cast it
  • Because your vote must be a secret, so that you cannot be blackmailed or paid to vote in a certain way – and there are some mechanisms for that
  • Because there is no – or little corruption in your country, and you don’t have a Dragnea (convicted for election fraud) pushing any buttons

You can see that in an electronic environment, this is hardly the case. Here, in an electronic environment, if you have a Dragnea, you are shot and buried. Here, in an electronic environment you:

  • Cannot see your vote since the moment you have printed (or pushed the button) to the moment of casting – anyone could see it
  • Cannot easily make sure that your vote is a secret. Once you act upon your vote and it is encrypted somehow, you have no way of knowing what you voted – it became a secret. So there is the trouble with that.
    Further more, assuming conventional encryption, there are master keys that can be easily compromised by an evil Dragnea.
  • Auditing such a system involves an extremely high and particular level ofexpertise, and any of the parties having a stake in the election would really have trouble finding people willing to take the risk of doing that for them. This is an extreme sensitive matter.

There is a research area concerned with tackling these issues. It is called “End-To-End Verifiable Voting Systems”.

End-To-End Verifiable Voting Systems

Basically, tackling these problems for e-voting systems means transforming an electronic environment for voting in such a manner that it can, at least handle the standards of non e-voting systems, and then add some specific electronic mumbo-jumbo to it, and make it available in a ‘pandemic environment’. [Oh my God, I’ve just said that, pandemic environment…]

The main transformation is: I, as a voter, must be able to act a secret vote up to casting it, and make sure my vote is accounted for, properly.

Homomorphic secret sharing

It would be wonderful if, while addressing the trust in the counting of the votes problem we would have a way of casting an encrypted vote, but still be able to count it even if it is encrypted. Well this can be done.

To my knowledge today, the most effective and advanced technology that can be used here is homomorphic encryption, and, more precise, a small subset of HE, called homomorphic secret sharing.

Homomorphic secret sharing is a secret sharing algorithm where the secret is encrypted using homomorphic encryption. In a nutshell homomorphic encryption is a type of encryption where you can do computations on the ciphertext – that is compute stuff directly on encrypted data, with no prior decryption. For example: in some HE schemes an encryption of a 5 plus an encryption of a 2 is an encryption of a 7. Hooray.

Bear in mind, the mathematics behind all this is pretty complex. I would not call it scary, but close enough.  However, there are smart people that are working on, and providing some, out-of-the-box libraries that software developers can use so that they can embed HE in their product. I would like to mention jus two here: Microsoft SEAL and PALISADE (backed by DARPA). Don’t get me wrong, today, you still have to know some mathematical tricks if you want to embed, HE in your software, but the really heavy part is done by these heroes that are providing these libraries.

Decentralized voting protocols using homomorphic secret sharing

In the next article I will talk about the challenges that you will face if you are trying to embed HE in your product, but until then, if you want to get a glimpse about the complexity, I will just go ahead and detail a decentralized voting protocol that uses homomorphic secret sharing.

  • Assume you have a simple vote (yes/no) – no overkill for now
  • Assume you have some authorities that will ‘count’ the votes. – number of authorities noted as A
  • Assume you have N voters
  1. Each authority will generate a public key. Anumber. Xa
  2. Each voter encodes his vote in a polynomial Pn, with the degree A-1 (number of authorities -1) and the constant term an encoding of the vote (for this case +1 for yes and -1 for no) all other coefs are random.
  3. Each voter computes the value of his polynomial  (Pn) – and thus his vote – at each authority public key Pn(Xa).
    1. K points are produced, they are pieces of the vote.
    1. Only if you know all the points you can figure out the Pn, and thus the vote. This is the decentralization part.
  4. Voter sends each authority the value computed using its key only
  5. Thus, each authority finds itself impossible to find how each user voted, as it does not have enough computed values – only has one.
  6. After all votes have been casted, each authority computes and publishes a sum (Sa) of the received values.
  7. Thus, a new polynomial is born (coefs are the Sa sums) with the constant term being the sum of all votes. If it is negative the result is yes, and vice versa.

If you had troubles following the secret sharing algorithm, don’t worry, you’re not alone. Here’s a helper illustration:

source: wiki on HSS

However, there are still problems:

  • Still, the voter cannot be sure that his/hers vote is properly casted
  • The authorities cannot be sure that a malicious voter did not computed his polynomial with a -100 constant, such that a single cast would count for 100 negative votes.
  • The homomorphic secret sharing does not even touch the other problems of voting systems, only the secrecy and the trust are tackled.

The challenges

See, you still have to know a little bit about polynomials and interpolation to be able to use this in your software.

The crazy part is that, in homomorphic encryption terms, homomorphic secret sharing is one of the simplest challenges.

Don’t worry though, in my next article I will show you some neat library (Microsoft SEAL), share my experience with you, and give you some tips and tricks for the moment when you will try to adopt this.

Until next time, remember: don’t take anything for granted.

Pragmatic Homomorphic Encryption

Hi again!

Since you’re here, I believe that you have a general idea about what homomorphic encryption is. If, however you are a little confused, here it is in a nutshell: you can do data processing directly on encrypted data. E.G. An encryption of a 5 multiplied by an encryption of a 2 is an encryption of a 10. Tadaaa!

This is pure magic for privacy. Especially with this hype that is happening now with all the data leaks, and new privacy regulation, and old privacy regulation, and s**t. Essentially, what you can do with this is very close to the holy grail of privacy: complete confidential computing, process data that is already encrypted, without the decryption key. Assuming data protection in transit is already done. See picture bellow:

Quick note here, most of the homomorphic schemes (BFV/CKKS/blabla..) use a public/private scheme for encrypting/decryption of data.

Now, I have been fortunate enough to work, in the past year, on a side project involving a lot of homomorphic encryption. I was using Microsoft SEAL and it was great. I am not going to talk about the math behind this type of encryption, not going to talk about the Microsoft SEAL library (although I consider it excellent), not going to talk about the noise-propagation problem in this kind of encryption.

I am, however, going to talk about a common pitfall that I have seen, and that is worrying. This pitfall is concerning the integrity of the result processing. Or, to be more precise, attacking the integrity of the expected result of processing.

Some Example

Let me give you an example: Assume you have an IoT solution that is monitoring some oil rigs. The IoT devices encrypt the data that is collected by them, then sends it to a central service for statistical analysis. The central service does the processing and provides an API for some other clients used by top management.

(This is just an example. I am not saying I did exactly this. It would be $tupid to break an NDA and be so open about it.)

If I, as an attacker compromise the service that is doing the statistical analysis, I cannot see the real data sent by the sensors. However, I could mess with it a little. I could, for instance, make sure that the statistical analysis returned by the API is rigged, that it shows whatever I want it to show.

I am not saying that I am able to change the input data. After all, I as an attacker do not have the key used for encryption, so that I am not able to encrypt new data in the series. I just go ahead and alter the result.

It seems obvious that you should protect such a system against impersonation/MitM/spoofing attacks. Well. Apparently, it is not that obvious.

The Trouble

While implementing this project, I got in touch with various teams that were working with homomorphic encryption, and it seems that there was a recurring issue. The problem is that the team that is implementing such a solution, usually is made up of experienced (at least) developers that have a solid knowledge of math / cryptography. But it is not their role to handle the overall security of the system.

The team that is responsible for the overall security of the system, is unfortunately, often decoupled with the details of a project that is under development. What do they “know” about the project? Homomorphic encryption? Well, that is cool, data integrity is handled by encryption, so why put any extra effort into that?

Please, please, do not overlook basic security just because some pretty neat researchers made a breakthrough regarding the efficiently of implementing a revolutionary encryption scheme. Revolutionary does not mean lazy. And FYI, a Full Homomorphic Encryption Scheme has been theorized since 1978.

To be fair play, I want to mention another library that is good at doing homomorphic encryption, PALISADE. I only have some production experience with Microsoft SEAL, and thus, I prefer it 😊

Be safe!

Wiping off some encraption using Managed Service Identities in Azure

I think it was 2008 when I was studying the second edition of “Writing Secure Code” by David LeBlanc and Michael Howard. It was there that I ran into the term “encraption” for the first time. Encraption refers, generally, to the bad practice of using poor encryption: old and deprecated algorithms, poor key complexities, storing the keys in plain sight, and other crap like that. Some of these bad practices can be fixed within a software development team by educating the members to ask for expertise in cryptography instead of just making assumptions. I have talked about this before, and yes, this aspect is critical.

I want to talk about a specific aspect of encraption: storing the keys in plain sight. I still encounter this situation so often, that I get physically ill when I am seeing it. If you claim to have never done it – store keys/sensitive information in plain sight – please think again, this time carefully. In fact, there is a secure coding principle stating that the source code of your application, in any configuration, should not be considered sensitive information. This, of course, has some very little exceptions.

Disclaimer! This article is just a summary of the reasons why you should consider using Azure Managed Service Identities, and does not describe the inner pluming of Managed Service Identities in Azure, nor does it provide examples. It would be stupid for me to claim that I can give more appropriate documentation than Microsoft already does on their platform. At the end of the article you will find links pointing you to those resources.

Let us imagine a classic situation here, for example a shared service that is required to store sensitive information that is generated by each of its users, and later make that data available ONLY to the user that generated it. Something like this:

Now, the trick is that any bug/quark that may appear inside the app will not allow for a user to obtain any key belonging to any other user. The cryptography behind this is not the scope of this article. Also, handling sensitive data while they are in memory is not the purpose of this article. So, in the end, how would you tackle this? Many of you would say, right away that key storage should be delegated to a dedicated service, let’s call it a HSM, like so:

Now, the problem is that you need to have an authenticated and secure communication channel with the HSM. If you are trying to accomplish this in your own infrastructure, then good luck. While it is possible to do it, it is most certainly expensive. If you are going to buy the HSM service from an external provider, then, most certainly you will have to obey a lot of security standards before obtaining it. Most of these so-called Key Vault services will allow access to them based on secret-based authentication (app id and secret) that your application, in turn will need to securely store. I hope you have spotted the loophole. The weakest link here is that your application will still have to manage a secret, a set of characters that is very, very sensitive information:

Of course, most teams will have a “rock-solid-secure-rocketscience-devops” that will protect this red key in the configuration management of the production environment. Someone will obey a very state of the art process to obtain and store the red key only in the production environment, and no one will ever get it. This is sarcasm in case you missed it. But why is it sarcasm? you may ask. Here are my reasons.

  • In 15 years and probably hundreds of projects with a lot of people involved, I have maybe seen 3 – three – projects that have done this properly. Typical mistakes are ranging between
    1. the red key has changed, no one knew that it can change, the production halted, a SWAT team rushed to save the day and used their own machines for changing the key
    2. development team has access to production
    3. the “rock-solid-secure-rocketscience-devops” is ignored by the “experienced” developer that hard-codes the key
    4. no one really cares about that key. It is just “infrastructure”
  • An attack can still target this red key. It is, in fact just another attack surface. Assuming the absurdity that all other parts of the system are secure, a system holding such a key will have at least two points of attack.
    1. The identity that has access to the production environment (this is very hard to eliminate)
    2. The red key itself

Why not eliminating this red key altogether? Why not going to a cloud provider where you can create an application (or any resource) and a Key Vault and then create atrust relationshipbetween those two that needs no other secrets in order to function. This way you will not have to manage yet another secret.

Please bear in mind, that if using an HSM or Key Vault, you can isolate all your sensitive information and keys at that level, so that you will never have to bother about the secure storage of those item ever. No more hard-coding, no more cumbersome dev-ops processes!

Managed Service Identities

This is possible in azure, using Managed Service Identities. As I have said before, the purpose of this article is not to replace the awesome documentation that Micro$oft has in place for MSI, but to encourage you to read it and use this magic feature in your applications. Current or future

Here is about Managed Service Identities:

https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview

Here is an example where you can use an Azure App Service together with an Key Vault

https://docs.microsoft.com/en-us/azure/app-service/overview-managed-identity

Stay safe!