Privacy – What

Breaking through passports

A fairly long time ago, at blackhat (as far as I remember) somebody came on stage with an implementation of some proposed design of an electronic passport and proved its extremely worrying security flaws. I’ll leave the pleasure of researching the event to you.
Back then (maybe around 2001-2003?), the spirit of “blackhat” (or whatever event it was) was free, and if you were following that kind of security events you were consistently exposed to serious security researchers challenging the disastrous security implementations of either mainstream technology providers (such as Microsoft 😉 ) or of the govt digitalization.

What does this have to do with passports today? Well, they are a bit more secured thanks to that kind of involvement. Secured for both their users as well as for their governments.
Well, even if they were not so secured, the biggest looser of a flawed normal passport is its government. The projected user impact is fairly low. Identity theft by means of passport forgery or “torgery” is not a huge phenomenon.

Legitimate concerns for any passport users

From the user security perspective, let’s explore:

Who does the passport say that I am ?
How do I know I carry a valid (authentic and integral) passport at any time?
What does the document scanning say about me ?
- e.g. it was scanned while passing through customs, ergo anybody that is able to see a record of that scan will know that I was there at a specific point in time

Extrapolating to COVID passes

Well. As far as the ethical concerns, I’ll leave those up to each and every one of you.

But as far as the security goes, I have a few very worrying concerns:

The implementation details are not ~~widely~~ available.
There are alot of security incidents with these documents. And they are not and cannot be hidden even in the mainstream media. (https://www.dw.com/en/security-flaws-uncovered-in-eu-vaccination-passport/a-58129016)
How do I as an owner and user of the pass know WHO and when views or verifies my scans.

I am not going to continue the list, it is already embarrassing

This last one is a bit concerning.
Here’s why: with a normal passport, once scanned, the person that gains or has access to a list of these scans knows that you went over some border sometime.
However, with COVID passes the same actor that has or gains access to your passport scans, knows where you are kind of right now. Remember, you’ll scan your pass even if you go to a restaurant.

What do I want?

As a COVID pass user I want to have the certainty that no passport scans are stored anywhere. And I do not want anybody except the scanner to be able to see my identity. Because the moment they do, they’ll know where I am and what I’m doing. And this is unacceptable. Mainly because the scanning frequency of such a pass can be daily for some of us.

So, can I get what I want ?

I have tried to get implementation details Both directly, by asking it, legally, and through the security community (both academic and professional). The result ? No result. The public debate(s) prior to implementing and adopting these passes were a complete joke.
How long until XXXX gets a hold of a list of my scans and then follows me around, or worse. Seems far fetched ? Take a look at the latest data leaks 😉

But as the philosophers Jagger and Richards once said: “You can’t always get what you want!”

( https://www.youtube.com/watch?v=Ef9QnZVpVd8&ab_channel=ABKCOVEVO )

Some concerning technical and ethical reviews

A lot, has been going on lately. So much so, that I do not even know how to start reviewing it.

I’ll just go ahead and speak about some technical projects and topics that I’ve been briefly involved in and that are giving me a fair amount of concern.

Issue number x: Citizen-facing services of nation states

A while back, I made a “prediction”: the digitalization of citizen facing services will be more present, especially as the pandemic situation is panning out. (here) and (here). I was right.
Well, to be completely honest, it was not really a prediction as I had two side (as a freelancer) projects that were involving exactly this. So I kind of had a small and limited view from inside.

Those projects ended, successfully delivered, and then came the opportunity for more. I kindly declined. Partly because I’m trying to raise a child with my wife, and there’s only so much time in the universe, and partly because I have deep ethical issues with what is happening.

I am not allowed to even mention anything remotely linked with the projects I’ve been involved in, but I will give you a parallel and thus unrelated example, hoping you connect the dots. Unrelated in terms of: I was not even remotely involved in the implementation of the example I’m bringing forward.

The example is: The Romanian STS (Service for Special Telecommunications) introduced the blockchain technology in the process of centralizing and counting citizen votes, in national or regional elections that are happening in Romania. You can read more about it here, and connect the dots for yourselves. You’ll also need to know a fair amount about the Romanian election law, but you’re smart people.

The Issue?

Flinging the blockchain concept to the people so that the people misunderstand it. Creating a more secure image that necessary. Creating a security illusion. Creating the illusion of decentralized control, while implementing the EXACT opposite. I’m not saying this is intentional, oh, no, it is just opportunistic: it happened because of the fast adoption.
Why? Blockchain is supposed to bring decentralization, and what it completely does in the STS implementation is the EXACT opposite: consolidate centralization.

While I have no link with what happened in Romania, I know for a fact that similar things shave happened elsewhere. This is bad.

I do not think that this is happening with any intention. I simply think there is A HUGE AMOUNT of opportunistic implementations going on SIMPLY because of the political pressure to satisfy the PR needs, and maybe, just maybe, give people the opportunity to simplify their lives. But the implementations are opportunistic, and from a security perspective, this is unacceptable!

Ethically

I think that while we, as a society, tend to focus on the ethics in using AI and whatnot, we are completely forgetting about ethics in terms of increased dependency of IT&C in general. I strongly believe that we have missed a link here. In the security landscape, this is going to cost us. Bigtime.

Designing for critical privacy and data protection systems (I)

Mind the factors

Lately, I’ve been doing some work in the area of cryptography and enterprise scale data protection and privacy. And so, it hit me: things are a lot different than they used to be, and they are changing fast. It seems that things are changing towards a more secure environment, with stronger DP and privacy requirements and it also seems that these changes are widely adopted. Somehow, I am happy about it. Somehow, I am worried.

Before I go a little deeper into the topic of how to design for critical privacy and DP systems, let me just enumerate three of the factors that are responsible for generating the changes that we are witnessing:

The evolving worldwide regulation and technology adoption started by EU 2016/679 regulation (a.k.a. GDPR)
The unimaginable progress we are covering in terms of big data analysis and ML
The digitalization of the citizen facing services of nation-states. (stuff like e-voting, that I really advocate against)

I don’t want to cover in-depth the way I see each factor influencing the privacy and DP landscape, but, as we go on, I just want you to have these three factors in mind. Mind the factors.

Emerging technologies

Talking about each concept and technology that is gaining momentum in this context is absolutely impossible. So, I choose to talk about two of the most challenging ones. Or, at least the ones that I perceive as being the most challenging: this is going to be a two episodes series about Differential Privacy and Homomorphic Encryption.

Differential privacy. e-Differential Privacy.

Differential Privacy, in a nutshell, from a space-station view, is a mathematical way of ensuring that reconstruction attacks are not possible at the present or future time.

Mathematical what? Reconstruct what? Time what? Let me give you a textbook example:

Assume we know the following about a group of people:

There are 7 people with the median age of 30 and the mean of 38.

4 are females with the median of 30 and the mean of 33.5

4 love sugar with the median of 51 and a mean of 48.5

3 sugar lovers are females with the median 36 and the mean 36.6

Challenge: give me the: age, sex, sugar preference and marital status of each individual.

Solution:

1. 8, female, sugar, not married

2. 18, male, no sugar, not married

3. 24, female, no sugar, not married

4. 30, male, no sugar, married

5. 36, female, sugar, married

6, 66, female, sugar, married

7. 84, male, sugar, married

Basically, a reconstruction attack for such a scenario involves finding peaks of plausibility in a plausibility versus probability plot. It goes something like this:

You can start brute forcing all the combinations of the seven participants. Considering all the features except age (so, gender, sugar preference, marital) you have 7^8= 5764801 possibilities, but all have roughly the same plausibility. So a possibility / plausibility plot, looks something like this

See, there does not seem to be any peaks in plausibility. But, once we factor in the age, well, things change. For example, although possible to have a 150 years old person, it is very implausible. Furthermore, it is more plausible to have an older individual married than a younger one, and so on. So, if we factor in age plausibility, a graph looks more like this:

See, there’s a peak of plausibility. That is most likely our solution. Now, if our published statistics are a little skewed. Say, we introduce just enough noise into them such that the impact is minimum for science, and we eliminate the unnecessary ones (if this can be done) then, a reconstruction attack is almost impossible. The purpose is to flatten, as much as possible, the graph above.

Now, to be fair, in our stretched-out textbook example, there’s no need to do the brute-force-assumption plausibility plot. Because the Mean and Median are published for each subset of results, you can simply write a deterministic equation system and solve for the actual solution.

Imagine you, as an attacker possess some external knowledge about your target from an external source. This external source may be an historical publishing over the same set of data, or a different data source altogether. This makes your reconstruction job easier.

e-Differential Privacy systems have a way of defining a privacy loss (i.e. a quantitative measure of the increase in the plausibility plot) Also, these systems define a privacy budget. And this is one of the real treasures of this math. You can make sure that, over time, you are not making the reconstruction attacks easier.

This stuff gained momentum as the US census bureau got the word out the they are using it, and also encouraged people to ask enterprises that own their data to use it.

So, as a software architect, how do I get ready for this?

First, at the moment, there are no out-of-the box solutions that can give you e-Differential Privacy for your data. If this is a requirement for you, you are most probably going to work with some data scientists / math degree that are going to tell you exactly what will be a measure a privacy loss for the features in your data. At least that is what I did 😊 Once they are defined you have to be ready to implement those.

There is a common pattern you can adopt. A proxy, a privacy guard:

You are smart enough to realize that CLEAN data means that some acceptable noise is introduced, such that the privacy budget is not greatly, if at all, impacted.

Challenges

If it was easy, everyone would do it, but it’s not, so suck it.

First, you and your team must be ready to understand what a highly trained math scientist is talking about. Get resources for that.

Second, you have to be careful, as an architect, to have formal definitions throughout your applications for the two concepts enumerated above: privacy budget, and privacy loss.

Third, in both my experience and in the textbook ongoing research, the database must contain the absolute raw data, including historic data if needed. This poses another security challenge: you don’t want to be fancy about using complicated math to protect your data while being vulnerable to a direct database attack. Something stupid like injection attacks have no place here. You can see now that the diagram above is oversimplified. It lacks a ton of proxies, security controls, DMZs and whatnot. Don’t make the same mistake I did and try to hide some data from the privacy guard, your life will be a misery.

Fourth, Be extremely careful about documenting this. It is not rare that software ecosystems change purpose, and the tend to be used where they are not supposed to. It may happen that such an ecosystem, with time, gets to be directly used for scientific research, from behind the privacy guard. That might not be acceptable. You know, scientists don’t like to have noisy data. So I’ve heard, I’m not a scientist.

That’s all for now.

In the second part we’re going to talk a little bit about the time I used Homomorphic Encryption. A mother****ing monster for me.

Stay safe!

Pragmatic Homomorphic Encryption

Hi again!

Since you’re here, I believe that you have a general idea about what homomorphic encryption is. If, however you are a little confused, here it is in a nutshell: you can do data processing directly on encrypted data. E.G. An encryption of a 5 multiplied by an encryption of a 2 is an encryption of a 10. Tadaaa!

This is pure magic for privacy. Especially with this hype that is happening now with all the data leaks, and new privacy regulation, and old privacy regulation, and s**t. Essentially, what you can do with this is very close to the holy grail of privacy: complete confidential computing, process data that is already encrypted, without the decryption key. Assuming data protection in transit is already done. See picture bellow:

Quick note here, most of the homomorphic schemes (BFV/CKKS/blabla..) use a public/private scheme for encrypting/decryption of data.

Now, I have been fortunate enough to work, in the past year, on a side project involving a lot of homomorphic encryption. I was using Microsoft SEAL and it was great. I am not going to talk about the math behind this type of encryption, not going to talk about the Microsoft SEAL library (although I consider it excellent), not going to talk about the noise-propagation problem in this kind of encryption.

I am, however, going to talk about a common pitfall that I have seen, and that is worrying. This pitfall is concerning the integrity of the result processing. Or, to be more precise, attacking the integrity of the expected result of processing.

Some Example

Let me give you an example: Assume you have an IoT solution that is monitoring some oil rigs. The IoT devices encrypt the data that is collected by them, then sends it to a central service for statistical analysis. The central service does the processing and provides an API for some other clients used by top management.

(This is just an example. I am not saying I did exactly this. It would be $tupid to break an NDA and be so open about it.)

If I, as an attacker compromise the service that is doing the statistical analysis, I cannot see the real data sent by the sensors. However, I could mess with it a little. I could, for instance, make sure that the statistical analysis returned by the API is rigged, that it shows whatever I want it to show.

I am not saying that I am able to change the input data. After all, I as an attacker do not have the key used for encryption, so that I am not able to encrypt new data in the series. I just go ahead and alter the result.

It seems obvious that you should protect such a system against impersonation/MitM/spoofing attacks. Well. Apparently, it is not that obvious.

The Trouble

While implementing this project, I got in touch with various teams that were working with homomorphic encryption, and it seems that there was a recurring issue. The problem is that the team that is implementing such a solution, usually is made up of experienced (at least) developers that have a solid knowledge of math / cryptography. But it is not their role to handle the overall security of the system.

The team that is responsible for the overall security of the system, is unfortunately, often decoupled with the details of a project that is under development. What do they “know” about the project? Homomorphic encryption? Well, that is cool, data integrity is handled by encryption, so why put any extra effort into that?

Please, please, do not overlook basic security just because some pretty neat researchers made a breakthrough regarding the efficiently of implementing a revolutionary encryption scheme. Revolutionary does not mean lazy. And FYI, a Full Homomorphic Encryption Scheme has been theorized since 1978.

To be fair play, I want to mention another library that is good at doing homomorphic encryption, PALISADE. I only have some production experience with Microsoft SEAL, and thus, I prefer it 😊

Be safe!

Avoidable privacy happenings

Last time, I tried to brief some of the steps you need to cover before starting to choose tools that will help you achieve compliance. Let’s dig a little deeper into that by using some real life negative examples that I ran into.

Case: The insufficiently authenticated channel.

Disclosure disclaimer: following examples are real. I have chosen to anonymize the data about the bank in this article, although I have no obligation whatsoever to do so. I could disclose the full information to you per request.

At one point, I received an e-mail from a bank in my inbox. I was not, am not, and hopefully will not be a client of that particular bank. Ever. The e-mail seemed (from the subject line) to inform me about some new prices of the services the bank provided. It was not marked as spam, and so it intrigued me. I ran some checks (traces, headers, signatures, specific backtracking magic), got to the conclusion that it is not spam, so I opened it. Surprise, it was directly addressed to me, my full name appeared somewhere inside. Oh’ and of course thanking ME that I chose to be their client. Well. Here’s a snippet (it is in Romanian, but you’ll get it):

Of course I complained to the bank. I was asking then to inform me how they’ve got my personal data, asking them to delete it, and so on. Boring.

About four+ months later (not even close to a compliant time) a response popped up:

Let me brief it for you: It said that I am a client of the bank, that I have a current account opened, where the account was opened. Oh but that is not all. They have also given me a copy of the original contract I supposedly signed. And a copy of the personal data processing document that I also signed and provided to them. Will the full blown personal data. I mean full blown: name, national id numbers, personal address etc. One problem tough: That data was not mine, it was some other guy’s data that had one additional middle name. And thus, a miracle data leak was born. It is small, but it can grow if you nurture it right…

What went wrong?

Well, in short, the guy filled in my e-mail address and nobody checked it, not him, not the bank, nobody. You imagine the rest.

Here’s what I am wondering.

Now, in the 21^st century, is it so hard to authenticate a channel of communication with a person? it difficult to implement a solution for e-mail confirmation based on some contract id? Is it really? We could do it for you, bank. Really. We’ll make it integrated with whatever systems you have. Just please, do it yourselves or ask for some help.
Obviously privacy was 100% absent from the process of answering my complaint. Even though I made a privacy complaint 🙂 Is privacy totally absent from all your processes?

In the end, this is a great example of poor legislative compliance, with zero security involved, I mean ZERO security. They have some poor legal compliance: there is a separate document asking for personal data and asking for permission to process it. The document was held, and it was accessible (ok, it was too accessible). They have answered my complaint even though it was not in a timely compliant manner, and I had not received any justification for the delay.

Conclusions?

Have a good privacy program. A global one.
Have exquisite security. OK, not exquisite, but have some information security in place.
When you choose tools, make sure they can support your privacy program.
Don’t be afraid to customize the process, or the tools. Me (and, to be honest, anybody in the business) could easily give you a quote for an authentication / authorization solution of your communication channels with any type of client.

I am sure you can already see for yourself how this is useful in the context of choosing tools that will help you organize your conference event, and still maintain its privacy compliance.