Designing for critical privacy and data protection systems (I)

Mind the factors

Lately, I’ve been doing some work in the area of cryptography and enterprise scale data protection and privacy. And so, it hit me: things are a lot different than they used to be, and they are changing fast. It seems that things are changing towards a more secure environment, with stronger DP and privacy requirements and it also seems that these changes are widely adopted. Somehow, I am happy about it. Somehow, I am worried.

Before I go a little deeper into the topic of how to design for critical privacy and DP systems, let me just enumerate three of the factors that are responsible for generating the changes that we are witnessing:

  • The evolving worldwide regulation and technology adoption started by EU 2016/679 regulation (a.k.a. GDPR)
  • The unimaginable progress we are covering in terms of big data analysis and ML
  • The digitalization of the citizen facing services of nation-states. (stuff like e-voting, that I really advocate against)

I don’t want to cover in-depth the way I see each factor influencing the privacy and DP landscape, but, as we go on, I just want you to have these three factors in mind. Mind the factors.

Emerging technologies

Talking about each concept and technology that is gaining momentum in this context is absolutely impossible. So, I choose to talk about two of the most challenging ones. Or, at least the ones that I perceive as being the most challenging: this is going to be a two episodes series about Differential Privacy and Homomorphic Encryption.

Differential privacy. e-Differential Privacy.

Differential Privacy, in a nutshell, from a space-station view, is a mathematical way of ensuring that reconstruction attacks are not possible at the present or future time.

Mathematical what? Reconstruct what? Time what? Let me give you a textbook example:

Assume we know the following about a group of people:

There are 7 people with the median age of 30 and the mean of 38.

4 are females with the median of 30 and the mean of 33.5

4 love sugar with the median of 51 and a mean of 48.5

3 sugar lovers are females with the median 36 and the mean 36.6

Challenge: give me the: age, sex, sugar preference and marital status of each individual.

Solution:

1. 8, female, sugar, not married

2. 18, male, no sugar, not married

3. 24, female, no sugar, not married

4. 30, male, no sugar, married

5. 36, female, sugar, married

6, 66, female, sugar, married

7. 84, male, sugar, married

Basically, a reconstruction attack for such a scenario involves finding peaks of plausibility in a plausibility versus probability plot. It goes something like this:

You can start brute forcing all the combinations of the seven participants. Considering all the features except age (so, gender, sugar preference, marital) you have 7^8= 5764801 possibilities, but all have roughly the same plausibility. So a possibility / plausibility plot, looks something like this

See, there does not seem to be any peaks in plausibility. But, once we factor in the age, well, things change. For example, although possible to have a 150 years old person, it is very implausible. Furthermore, it is more plausible to have an older individual married than a younger one, and so on. So, if we factor in age plausibility, a graph looks more like this:

See, there’s a peak of plausibility. That is most likely our solution.  Now, if our published statistics are a little skewed. Say, we introduce just enough noise into them such that the impact is minimum for science, and we eliminate the unnecessary ones (if this can be done) then, a reconstruction attack is almost impossible. The purpose is to flatten, as much as possible, the graph above.

Now, to be fair, in our stretched-out textbook example, there’s no need to do the brute-force-assumption plausibility plot. Because the Mean and Median are published for each subset of results, you can simply write a deterministic equation system and solve for the actual solution.

Imagine you, as an attacker possess some external knowledge about your target from an external source. This external source may be an historical publishing over the same set of data, or a different data source altogether. This makes your reconstruction job easier.

e-Differential Privacy systems have a way of defining a privacy loss (i.e. a quantitative measure of the increase in the plausibility plot) Also, these systems define a privacy budget. And this is one of the real treasures of this math. You can make sure that, over time, you are not making the reconstruction attacks easier.

This stuff gained momentum as the US census bureau got the word out the they are using it, and also encouraged people to ask enterprises that own their data to use it.

So, as a software architect, how do I get ready for this?

First, at the moment, there are no out-of-the box solutions that can give you e-Differential Privacy for your data. If this is a requirement for you, you are most probably going to work with some data scientists / math degree that are going to tell you exactly what will be a measure a privacy loss for the features in your data. At least that is what I did 😊 Once they are defined you have to be ready to implement those.

There is a common pattern you can adopt. A proxy, a privacy guard:

You are smart enough to realize that CLEAN data means that some acceptable noise is introduced, such that the privacy budget is not greatly, if at all, impacted.

Challenges

If it was easy, everyone would do it, but it’s not, so suck it.

First, you and your team must be ready to understand what a highly trained math scientist is talking about. Get resources for that.

Second, you have to be careful, as an architect, to have formal definitions throughout your applications for the two concepts enumerated above: privacy budget, and privacy loss.

Third, in both my experience and in the textbook ongoing research, the database must contain the absolute raw data, including historic data if needed. This poses another security challenge: you don’t want to be fancy about using complicated math to protect your data while being vulnerable to a direct database attack. Something stupid like injection attacks have no place here. You can see now that the diagram above is oversimplified. It lacks a ton of proxies, security controls, DMZs and whatnot. Don’t make the same mistake I did and try to hide some data from the privacy guard, your life will be a misery.

Fourth, Be extremely careful about documenting this. It is not rare that software ecosystems change purpose, and the tend to be used where they are not supposed to. It may happen that such an ecosystem, with time, gets to be directly used for scientific research, from behind the privacy guard. That might not be acceptable. You know, scientists don’t like to have noisy data. So I’ve heard, I’m not a scientist.

That’s all for now.

In the second part we’re going to talk a little bit about the time I used Homomorphic Encryption. A mother****ing monster for me.

Stay safe!

Ordering the ubiquitous

Recently, I was about to tell a joke that started like this:

“Two guys walk into a bar, the first one says… “

Someone went all anti-climactic on the storyteller (me), stopping him (me), asking: how do you know which one is the first one?

He was not, at all, trained in mathematics. He does not know that, when order is not important, order can, and often must, be imposed.

I did not bother to explain this, I just rephrased:

“Two guys walk into a bar, one of them says… “

I am a good storyteller, after all.

This got me thinking:

A random first story

I have talked about how, back in 2012-2013, I had had implemented, from scratch, a mechanism for single execution of a mission critical operation. This was before blockchain was KBOOM. Blockchain existed, but I only knew a single and remote person that knew something about blockchain.  Our team chosen to not use this relatively new technology at that point.

Last year, 2018, I was requested by my client to lead the rework of this system, but now, using blockchain. O yeah! We’ve got it done some months ago. This turned out to be the last side-project I took. For this year at least. I have my first child being born soon, and I need a break from these challenges. Anyway, I divert, back on track: single execution, blockchain, and a challenge accompanied by a frustration. We were a team of six people implementing this. Before starting, I wanted to make sure that everyone is on the same page in terms of what we do, and what blockchain is. Turns out, that while explaining bits of the blockchain, some people had troubles understanding the most basic concept: consensus and consensus algorithms. I went all haywire trying to explain concepts like: termination, integrity, agreement and fault tolerant consensus. I immediately remembered that when I was young, some very smart guy, forced me to read a paper called: „Time, Clocks, and the Ordering of Events in a Distributed System” – by Leslie Lamport. I mandated that this shall be read prior to my explanation. Everything went smooth from there on.

Be lucky. It’s your responsibility now.

Part of my daily job is working with the cloud. Large scale, small scale, simple, complicated, doing cloud design, develop cloud solutions, teaching, consulting and whatnot. While doing all this for some time now, I realized that a certain generation of technical, and otherwise very well-trained people, mostly in the software development area, has a very big problem in working with distributed systems over the cloud.

I could not understand this for years. It just did not compute. Why was this happening?

Well, turns out, some people were just not fortunate enough to have a smart guy around while “growing up”, that was able to teach them the basics of distributed computing.

I have repeated the experiment from the first story 9 additional times. Each time the same result. It’s like a veil was unveiled for some people.

So, I dare you, I double dare you, If you feel you don’t quite grasp software developing solutions in a distributed environment, have a read on the same paper. It is a masterpiece from 1976-1977, I promise you.

Cloud is dead

Yes. Cloud is dead. We now have ubiquitous computing. And, to be honest, this is close to be dead soon enough. A non-distributed environment does not exist anymore. I am not sure it ever did.

Be good!

Pragmatic Homomorphic Encryption

Hi again!

Since you’re here, I believe that you have a general idea about what homomorphic encryption is. If, however you are a little confused, here it is in a nutshell: you can do data processing directly on encrypted data. E.G. An encryption of a 5 multiplied by an encryption of a 2 is an encryption of a 10. Tadaaa!

This is pure magic for privacy. Especially with this hype that is happening now with all the data leaks, and new privacy regulation, and old privacy regulation, and s**t. Essentially, what you can do with this is very close to the holy grail of privacy: complete confidential computing, process data that is already encrypted, without the decryption key. Assuming data protection in transit is already done. See picture bellow:

Quick note here, most of the homomorphic schemes (BFV/CKKS/blabla..) use a public/private scheme for encrypting/decryption of data.

Now, I have been fortunate enough to work, in the past year, on a side project involving a lot of homomorphic encryption. I was using Microsoft SEAL and it was great. I am not going to talk about the math behind this type of encryption, not going to talk about the Microsoft SEAL library (although I consider it excellent), not going to talk about the noise-propagation problem in this kind of encryption.

I am, however, going to talk about a common pitfall that I have seen, and that is worrying. This pitfall is concerning the integrity of the result processing. Or, to be more precise, attacking the integrity of the expected result of processing.

Some Example

Let me give you an example: Assume you have an IoT solution that is monitoring some oil rigs. The IoT devices encrypt the data that is collected by them, then sends it to a central service for statistical analysis. The central service does the processing and provides an API for some other clients used by top management.

(This is just an example. I am not saying I did exactly this. It would be $tupid to break an NDA and be so open about it.)

If I, as an attacker compromise the service that is doing the statistical analysis, I cannot see the real data sent by the sensors. However, I could mess with it a little. I could, for instance, make sure that the statistical analysis returned by the API is rigged, that it shows whatever I want it to show.

I am not saying that I am able to change the input data. After all, I as an attacker do not have the key used for encryption, so that I am not able to encrypt new data in the series. I just go ahead and alter the result.

It seems obvious that you should protect such a system against impersonation/MitM/spoofing attacks. Well. Apparently, it is not that obvious.

The Trouble

While implementing this project, I got in touch with various teams that were working with homomorphic encryption, and it seems that there was a recurring issue. The problem is that the team that is implementing such a solution, usually is made up of experienced (at least) developers that have a solid knowledge of math / cryptography. But it is not their role to handle the overall security of the system.

The team that is responsible for the overall security of the system, is unfortunately, often decoupled with the details of a project that is under development. What do they “know” about the project? Homomorphic encryption? Well, that is cool, data integrity is handled by encryption, so why put any extra effort into that?

Please, please, do not overlook basic security just because some pretty neat researchers made a breakthrough regarding the efficiently of implementing a revolutionary encryption scheme. Revolutionary does not mean lazy. And FYI, a Full Homomorphic Encryption Scheme has been theorized since 1978.

To be fair play, I want to mention another library that is good at doing homomorphic encryption, PALISADE. I only have some production experience with Microsoft SEAL, and thus, I prefer it 😊

Be safe!

Wiping off some encraption using Managed Service Identities in Azure

I think it was 2008 when I was studying the second edition of “Writing Secure Code” by David LeBlanc and Michael Howard. It was there that I ran into the term “encraption” for the first time. Encraption refers, generally, to the bad practice of using poor encryption: old and deprecated algorithms, poor key complexities, storing the keys in plain sight, and other crap like that. Some of these bad practices can be fixed within a software development team by educating the members to ask for expertise in cryptography instead of just making assumptions. I have talked about this before, and yes, this aspect is critical.

I want to talk about a specific aspect of encraption: storing the keys in plain sight. I still encounter this situation so often, that I get physically ill when I am seeing it. If you claim to have never done it – store keys/sensitive information in plain sight – please think again, this time carefully. In fact, there is a secure coding principle stating that the source code of your application, in any configuration, should not be considered sensitive information. This, of course, has some very little exceptions.

Disclaimer! This article is just a summary of the reasons why you should consider using Azure Managed Service Identities, and does not describe the inner pluming of Managed Service Identities in Azure, nor does it provide examples. It would be stupid for me to claim that I can give more appropriate documentation than Microsoft already does on their platform. At the end of the article you will find links pointing you to those resources.

Let us imagine a classic situation here, for example a shared service that is required to store sensitive information that is generated by each of its users, and later make that data available ONLY to the user that generated it. Something like this:

Now, the trick is that any bug/quark that may appear inside the app will not allow for a user to obtain any key belonging to any other user. The cryptography behind this is not the scope of this article. Also, handling sensitive data while they are in memory is not the purpose of this article. So, in the end, how would you tackle this? Many of you would say, right away that key storage should be delegated to a dedicated service, let’s call it a HSM, like so:

Now, the problem is that you need to have an authenticated and secure communication channel with the HSM. If you are trying to accomplish this in your own infrastructure, then good luck. While it is possible to do it, it is most certainly expensive. If you are going to buy the HSM service from an external provider, then, most certainly you will have to obey a lot of security standards before obtaining it. Most of these so-called Key Vault services will allow access to them based on secret-based authentication (app id and secret) that your application, in turn will need to securely store. I hope you have spotted the loophole. The weakest link here is that your application will still have to manage a secret, a set of characters that is very, very sensitive information:

Of course, most teams will have a “rock-solid-secure-rocketscience-devops” that will protect this red key in the configuration management of the production environment. Someone will obey a very state of the art process to obtain and store the red key only in the production environment, and no one will ever get it. This is sarcasm in case you missed it. But why is it sarcasm? you may ask. Here are my reasons.

  • In 15 years and probably hundreds of projects with a lot of people involved, I have maybe seen 3 – three – projects that have done this properly. Typical mistakes are ranging between
    1. the red key has changed, no one knew that it can change, the production halted, a SWAT team rushed to save the day and used their own machines for changing the key
    2. development team has access to production
    3. the “rock-solid-secure-rocketscience-devops” is ignored by the “experienced” developer that hard-codes the key
    4. no one really cares about that key. It is just “infrastructure”
  • An attack can still target this red key. It is, in fact just another attack surface. Assuming the absurdity that all other parts of the system are secure, a system holding such a key will have at least two points of attack.
    1. The identity that has access to the production environment (this is very hard to eliminate)
    2. The red key itself

Why not eliminating this red key altogether? Why not going to a cloud provider where you can create an application (or any resource) and a Key Vault and then create atrust relationshipbetween those two that needs no other secrets in order to function. This way you will not have to manage yet another secret.

Please bear in mind, that if using an HSM or Key Vault, you can isolate all your sensitive information and keys at that level, so that you will never have to bother about the secure storage of those item ever. No more hard-coding, no more cumbersome dev-ops processes!

Managed Service Identities

This is possible in azure, using Managed Service Identities. As I have said before, the purpose of this article is not to replace the awesome documentation that Micro$oft has in place for MSI, but to encourage you to read it and use this magic feature in your applications. Current or future

Here is about Managed Service Identities:

https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview

Here is an example where you can use an Azure App Service together with an Key Vault

https://docs.microsoft.com/en-us/azure/app-service/overview-managed-identity

Stay safe!

Probabilistic sudoku using Infer.NET #2

Probably hello again! We started covering the probabilistic solution to the sudoku puzzle, and I have promised a series of articles that will guide you through the journey I took to tackle this topic. Ultimately, we’ll also have a little working example.

Firsts things first. In the previous article I have promised you that I will show you how probabilities are to be included in the graphical model. In order to tacke this please consider the following questions:
What problems does a probabilistic model solve for the sudoku puzzle? How do you imagine a probabilistic sudoku model? When is it appropriate to use a probabilistic model? Now, before going on to read the rest of the content, please spend some time to formulate your own answers to these questions. Anyway, I will provide some useful answers below, because, as you figured, these are key questions in understanding this topic 😊

(Anoying pause … )

Let’s go:
When would probabilities be apropriate for tackling sudoku ? If any sudoku puzzle has one single solution (well-formed sudoku puzzle), then the usage of probabilities can be usefull, but without too much of a benefit. If, however, multiple solutions are possible for a sudoku puzzle, then, using probabilities to solve a puzzle can bring alot of benefits. If you want to find out more about the interesting math behind sudoku, feel free to check this out:
http://pi.math.cornell.edu/~mec/Summer2009/Mahmood/More.html

How do you imagine a probabilistic model for the sudoku game ? Would it be cool to haave, given a sudoku puzzle, and a certain random value (say, 3) a probability distribution for each cell and 3 as a possible value? Check out the image bellow:

Now, I used a 4×4 sudoku as an example. It is pretty dificult to come with a non well-formed 4×4 sudoku, but I made an effort 🙂

white means 0 probability (easy), light shade is low (whatever that means) probability, darker means higher. Same shade means same probability. If you imagine a non well-formed 12×12 puzzle you can see where this probability game comes in handy 🙂

Good, now, remember, in the last article we have established that we are going to use a graphical model (using graph theory) for tackling the probabilistic sudoku challenge. I promised I will show you how probabilities can fit into this model, so here it is:

Each solution/value node in the graph will hold a vector describing probability distribution for each possible value. e.g. for cell number 2 (r1,c2) the probability vector is (0, 50%, 0, 50%). Indexes the vector are 1 through N (4). 4 Because we have a 4×4 puzzle for this example :). Yes, they are not zero based 🙂 This is it. That’s all I wanted to tell you for now.

In order for me give you content that is as engaging as possible, I will do my best to write the next article so that it involves as little as possible in regard to the rest of the mathematics involved in this solution, and go directly to the Inver.NET implementation. Belive me, trying to formulate an article that omits a hard definition of marginal distribution and Belief Propagation (sum-product message passing) and goes directly to using these concepts in code will be dificult.

But I trust that we can do it next time! Probably!

Probabilistic sudoku using Infer.NET #1

Model-based machine learning and the probabilistic programming paradigm is coming to the .NET world. If, by any chance, you are unfamiliar with these topics, please feel free to check the links provided above 🙂

Given this happy situation, I figured that it would be nice to build a probabilistic sudoku solver / generator using the model-based machine learning principles and Infer.NET.

I’ve done it! but trust me, it was a trip worth sharing. I have decided to share this journey with everybody in a series of articles / talks / gatherings.

First, does it make sense to have a probabilistic approach for the sudoku puzzle ? Well, yes it does. It is an NP-complete proven back in 2003 -problem, so it makes sense to optimize the solution as much as possible.

Second, building the probabilistic model for the generic sudoku puzzle turned out to be a significant challenge. This small article is dedicated to just a fraction of the “probabilistic model for the sudoku puzzke” problem, namely the reason for choosing a Probabilistical Graphical Model – a probabilistical model based on graphs. I hope I will be able to write an entire series of articles that will take you through the entire journey I took. Doing this in just one article is impossible, and, let me tell you, extenuating to read and understand. So let’s begin.

Imagine a simple 4×4 sudoku puzzle

Fig 1. 4×4 sudoku puzzle

Trying to drill down the components of a sudoku puzzle, we can come up with the following:

  1. The Solution elements. The elements of the solution (numbers inside). Let’s call such an element Sn, In a row-scan order, for the sample above: (1,2,4,3,4,3,1,2,2,1,3,4,3,4,2,1)
  2. The constraints. You know the rules of sudoku, right? Then, constraints are the groups in which solution elements live. For the 4×4 sudoku puzzle, there are 12 of them. Please see the Fig 2. bellow.
Fig 2. Constraints for the sudoku puzzle

The first constraint (C1) is the first row. You get the idea. (C9) is the first subgrid constraint.

Now the relationship between the constraint and solution nodes (Cm and Sn) can be a bipartite graph as follows

Fig 3. The graphical model

Again, you get the idea. The information held by the bipartite graph is: what constraint applies to what set of values.

That’s enough for now. Trust me! In the next article I will try to show you how probabilities can fit into this model.

If, by any chance you want to skip-forward to the Infer.NET solution, make sure you read and understand the following paper that I used to create the probabilistic model for the sudoku puzzle, and snip out some pictures for this article.

Until the next article in the series, remember: the world is beautiful! Probably.

Four security dimensions of software development

It’s not my definite characteristic to write boilerplate articles about obvious challenges, but I had a fairly recent experience (December 2018). I was doing some security work for an old client of mine and found that it was facing the absolute same basic problems that I tackled many times before. So, I remembered that more than 1.5 years ago I summed those problems up into the following material:

Originally published [here]

Having a job that requires deep technical involvement in a prolific forest of software projects certainly has its challenges. I don’t really want to emphasize the challenges, as I want to talk about one of its advantages: being exposed to issues regarding secure software development in our current era.

Understanding these four basic dimensions of developing secure software is key to starting building security into the software development lifecycle.

Dimension Zero: Speaking the same language

The top repetitive problem that I found in my experience, regardless of the maturity of the software development team, is the heterogeneous understanding of security. This happens at all levels of a software development team: from stakeholders, project managers to developers, testers and ultimately users.

It’s not that there is a different understanding of security between those groups. That would be easy to fix. It’s that inside each group there are different understandings of the same key concepts about security.

As you can expect, this cannot be good. You cannot even start talking about a secure product if everybody has a different idea of what that means.

So how can a team move within this uncertain Dimension Zero? As complicated as this might seem, the solution is straightforward: build expertise inside the team and train the team in security.

How should a final resolution look like at the end of this dimension? You should have put in place a framework for security that lives besides your development lifecycle, like Security Development Lifecycle (SDL) from Microsoft for example. Microsoft SDL is a pretty good resource to start with while keeping the learning loop active during the development process.

Dimension One: Keeping everybody involved.

Let’s assume that a minor security issue appears during implementation of some feature. One of the developers finds a possible flaw. She may go ahead and resolve it, consider it as part of her job, and never tell anyone about it. After all, she has already been trained to do it.

Well… no!

Why would you ask, right!? This looks counterintuitive, especially because “build expertise inside the team and train the team in security” was one of the “dimension zero”’s to go with advice.

Primarily because that is how you start losing the homogeneity you got when tackling Dimension Zero. Furthermore, there will always be poles of security expertise, especially in large teams, you want to have the best expertise when solving a security issue.

Dimension Two: Technical

Here’s a funny fact: we can’t take the developers out of the equation. No matter how hard we try. Security training for developers must include a lot of technical details, and you must never forget about:

  • Basics of secure coding. 
    (E.g. never do stack/buffer overflows, understand privilege separation, sandboxing, cryptography, and …unfortunately many more topics)
  • Know your platform. Always stay connected with the security aspects of the platform you are developing on and for.
    (E.g. if you are a .NET developer, always know its vulnerabilities)
  • Know the security aspects of your environment.
    (E.g. if you develop a web application, you should be no stranger of XSRF)

This list can go forever, but the important aspect is never to forget about the technical knowledge that the developers need to be expsosed on.

Dimension Three: Don’t freak out.

You will conclude that you cannot have a secure solution within the budget you have. This can happen multiple times during a project’s development. That is usually a sign that you got the threat model wrong. Probably you assumed an omnipresent and omnipotent attacker. [We all know you can’t protect from the “Chupacabra”, so you shouldn’t pay a home visit.]

This kind of an attacker doesn’t exist… yet. So, don’t worry too much about it, focus on the critical aspects that need to be secured, and you’ll restore the balance with the budget in no time.

Instead of a sum up of the 4 security dimensions of software development, I wish you happy secure coding and leave you a short-but-important reading list:

Be safe!

Pragmatic steps for cybersecurity consolidation

At the end of last year, I had some time to review and get up-to-date with some of the most important security incidents of 2018. Some of these incidents are wide-spread knowledge, some of them are particular to the activity that I do. While doing this, I figured that I could draw some pragmatic conclusions about what basic protection is against “a generic 2018 cybersecurity threat”. I have great friends and colleagues, and so one thing leads to another and we get to publish a small eBook on this topic.

This small eBook is designed for decision makers to gain a high-level overview of topics, as well as for IT professionals responsible for security steps to be implemented.

All things considered, we hope that everyone who will read the eBook and will implement some recommendation to their current strategy / development / infrastructure / design / testing practices will improve their overall products’ or services’ security.

You can download it here. Of course, this is free. If you want to get it directly from me, drop me an e-mail please, I’ll make sure to reply with the proper attachment :).

I am the author, and my colleagues

Tudor Damian – Technical curator

Diana Tataran – General curator

Noemi Bokor – Visual Identity

Avaelgo – Sponsored some time to make this possible

Are the ones who made this possible.

Cheers to you to.

Singular Execution of Mission-Critical Operations

Something happened this month with Romania’s ING Bank. I’m sure you’re probably aware of it. They managed to execute a several (well, maybe more than just a several) transactions more than once.  Well, shit happens, I guess. They  have eventually fixed it. At least they say so. I choose believe them.

This unfortunate happening triggered a memory of  my first time working in a mission-critical environment where certain operations were supposed to be executed exactly, absolutely, only once. It was for a german company. back in 2013. I am not allowed to mention or make any refference to them or the project, so let’s anonymously call them Weltschmerz Inc. It went something like this (oversimplified diagram):

I don’t claim that ING’s systems can be oversimplified to this level, but for the sake of the argument, and the protection I assumed for the so-called Weltschmerz Inc. let’s go with the banking example.

Trusted actor is me, when using a payment instrument that allows me to innitiate a transaction. (can be me using my card, or me being authenticated in any of their systems)

Trusted application is the innitial endpoint where I place my input describing my transaction (can be a POS, can be an e-banking application, anything)

The Mission-Critical Operation is the magic. Somehow, the application (be it POS, e-banking, whatsoever) knows how to construct such a dangerous operation.

Trick is, that whoever handles the execution of this operation must do it exactly, absolutely, only once. If the trusted application has a bug /attack/misfortune and generates two consecutive identical operations, one of them will never get executed. If I make a dubious mistake and somehow am allowed to quickly press twice a button, or if the e-banking / POS undergoes an attack, the second operation will be invalid. If anyone tries to pull a replay attack, it will still not work.

How to tackle this? Well, there are alot of solutions for this problem. Most of them gravitate around cryptography and efficient searching, here’s the approach we took back then:

Digitally signing the operation: necesarry in order to obtain a trusted fingerprint of the operation. the perfect unique identifier of the operation.
I understand, it is not easy to accomodate a digital signature ecosystem inside your infrastructure, there’s a lot of trust, PKI + certificates, guns, doors, locks, bieurocracy and shit to handle. It is expensive, but that’s life, no other way around it unfortunately.

Storing and partitioning: this signed version is stored wherever. However its signed hash must be partitioned based on variable that derrive from the business itself. If we are to consider banking, and if we speculate, we could come up to: time of the operation, identified recipient, innitiator, requested value, actual value, soo many more possibilities….  This partition is needed because, well, theory and practice tells us that  “unicity has no value unless confined” If you are a very young developer, keep that in mind, it will cut you some slack later in your life.

Storing this hash uniquely inside a partition is easy now, it is ultimately just a carefull comparrison of the hashes inside a partition and the new operation which is a candidate for execution.

Hint: be carefull in including time in your partition. Time should not only be a part inside the signed operation, but also a separate, synchronised, independent, clock. I’m sure you already know this.

If you do this partitioning and time handling by the book, no replay attack will ever work.

Execution: Goes in all partitions that have something inside of them, gets the operations, does the magic. Magic does not include deleting the operation hash in the partition afterwards. It includes some other magic maker. I choosed my words carefully here :). #ACID.

There’s a lot more to it: 

  • signed hashes should be considered highly sensitive secrets, tough an encryption mechanism must be employed. Key management in this case is an issue. That’s why you will probably need an HSM or some sort of simmilar vault for the keys, and key derivates
  • choose your algorithms carefully. If you have no real expertise in cryptography, please call someone that does. Never assume anything here unless you really know how to validate your assumptions
  • maintaining such an infrastructure comes with a cost. It’s not such a deal breaker, but it is to be considered.

Again, I am not claiming that ING Romania did anything less than the best in order to ensure the singular execution, this article is not related directly to them. It is just a kind reminder, that it is possible to design such a mission-critical environment, for singular execution of certain operations.

As for my experience, it was not in banking, but rather a more open environment. #Marine, #Navigation.

Cheers to us all.

Avoidable privacy happenings

Last time, I tried to brief some of the steps you need to cover before starting to choose tools that will help you achieve compliance. Let’s dig a little deeper into that by using some real life negative examples that I ran into.

Case: The insufficiently authenticated channel.

Disclosure disclaimer: following examples are real. I have chosen to anonymize the data about the bank in this article, although I have no obligation whatsoever to do so. I could disclose the full information to you per request.

At one point, I received an e-mail from a bank in my inbox. I was not, am not, and hopefully will not be a client of that particular bank. Ever. The e-mail seemed (from the subject line) to inform me about some new prices of the services the bank provided. It was not marked as spam, and so it intrigued me. I ran some checks (traces, headers, signatures, specific backtracking magic), got to the conclusion that it is not spam, so I opened it. Surprise, it was directly addressed to me, my full name appeared somewhere inside. Oh’ and of course thanking ME that I chose to be their client. Well. Here’s a snippet (it is in Romanian, but you’ll get it):

Of course I complained to the bank. I was asking then to inform me how they’ve got my personal data, asking them to delete it, and so on. Boring.

About four+ months later (not even close to a compliant time) a response popped up:

Let me brief it for you: It said that I am a client of the bank, that I have a current account opened, where the account was opened. Oh but that is not all. They have also given me a copy of the original contract I supposedly signed. And a copy of the personal data processing document that I also signed and provided to them. Will the full blown personal data. I mean full blown: name, national id numbers, personal address etc. One problem tough: That data was not mine, it was some other guy’s data that had one additional middle name. And thus, a miracle data leak was born. It is small, but it can grow if you nurture it right…

What went wrong?

Well, in short, the guy filled in my e-mail address and nobody checked it, not him, not the bank, nobody. You imagine the rest.

Here’s what I am wondering.

  1. Now, in the 21st century, is it so hard to authenticate a channel of communication with a person? it difficult to implement a solution for e-mail confirmation based on some contract id? Is it really? We could do it for you, bank. Really. We’ll make it integrated with whatever systems you have. Just please, do it yourselves or ask for some help.
  2. Obviously privacy was 100% absent from the process of answering my complaint. Even though I made a privacy complaint 🙂 Is privacy totally absent from all your processes?

In the end, this is a great example of poor legislative compliance, with zero security involved, I mean ZERO security. They have some poor legal compliance: there is a separate document asking for personal data and asking for permission to process it. The document was held, and it was accessible (ok, it was too accessible). They have answered my complaint even though it was not in a timely compliant manner, and I had not received any justification for the delay.

Conclusions?

  1. Have a good privacy program. A global one.
  2. Have exquisite security. OK, not exquisite, but have some information security in place.
  3. When you choose tools, make sure they can support your privacy program.
  4. Don’t be afraid to customize the process, or the tools. Me (and, to be honest, anybody in the business) could easily give you a quote for an authentication / authorization solution of your communication channels with any type of client.

I am sure you can already see for yourself how this is useful in the context of choosing tools that will help you organize your conference event, and still maintain its privacy compliance.