Probabilistic sudoku using Infer.NET #1

Model-based machine learning and the probabilistic programming paradigm is coming to the .NET world. If, by any chance, you are unfamiliar with these topics, please feel free to check the links provided above 🙂

Given this happy situation, I figured that it would be nice to build a probabilistic sudoku solver / generator using the model-based machine learning principles and Infer.NET.

I’ve done it! but trust me, it was a trip worth sharing. I have decided to share this journey with everybody in a series of articles / talks / gatherings.

First, does it make sense to have a probabilistic approach for the sudoku puzzle ? Well, yes it does. It is an NP-complete proven back in 2003 -problem, so it makes sense to optimize the solution as much as possible.

Second, building the probabilistic model for the generic sudoku puzzle turned out to be a significant challenge. This small article is dedicated to just a fraction of the “probabilistic model for the sudoku puzzke” problem, namely the reason for choosing a Probabilistical Graphical Model – a probabilistical model based on graphs. I hope I will be able to write an entire series of articles that will take you through the entire journey I took. Doing this in just one article is impossible, and, let me tell you, extenuating to read and understand. So let’s begin.

Imagine a simple 4×4 sudoku puzzle

Fig 1. 4×4 sudoku puzzle

Trying to drill down the components of a sudoku puzzle, we can come up with the following:

  1. The Solution elements. The elements of the solution (numbers inside). Let’s call such an element Sn, In a row-scan order, for the sample above: (1,2,4,3,4,3,1,2,2,1,3,4,3,4,2,1)
  2. The constraints. You know the rules of sudoku, right? Then, constraints are the groups in which solution elements live. For the 4×4 sudoku puzzle, there are 12 of them. Please see the Fig 2. bellow.
Fig 2. Constraints for the sudoku puzzle

The first constraint (C1) is the first row. You get the idea. (C9) is the first subgrid constraint.

Now the relationship between the constraint and solution nodes (Cm and Sn) can be a bipartite graph as follows

Fig 3. The graphical model

Again, you get the idea. The information held by the bipartite graph is: what constraint applies to what set of values.

That’s enough for now. Trust me! In the next article I will try to show you how probabilities can fit into this model.

If, by any chance you want to skip-forward to the Infer.NET solution, make sure you read and understand the following paper that I used to create the probabilistic model for the sudoku puzzle, and snip out some pictures for this article.

Until the next article in the series, remember: the world is beautiful! Probably.

Four security dimensions of software development

It’s not my definite characteristic to write boilerplate articles about obvious challenges, but I had a fairly recent experience (December 2018). I was doing some security work for an old client of mine and found that it was facing the absolute same basic problems that I tackled many times before. So, I remembered that more than 1.5 years ago I summed those problems up into the following material:

Originally published [here]

Having a job that requires deep technical involvement in a prolific forest of software projects certainly has its challenges. I don’t really want to emphasize the challenges, as I want to talk about one of its advantages: being exposed to issues regarding secure software development in our current era.

Understanding these four basic dimensions of developing secure software is key to starting building security into the software development lifecycle.

Dimension Zero: Speaking the same language

The top repetitive problem that I found in my experience, regardless of the maturity of the software development team, is the heterogeneous understanding of security. This happens at all levels of a software development team: from stakeholders, project managers to developers, testers and ultimately users.

It’s not that there is a different understanding of security between those groups. That would be easy to fix. It’s that inside each group there are different understandings of the same key concepts about security.

As you can expect, this cannot be good. You cannot even start talking about a secure product if everybody has a different idea of what that means.

So how can a team move within this uncertain Dimension Zero? As complicated as this might seem, the solution is straightforward: build expertise inside the team and train the team in security.

How should a final resolution look like at the end of this dimension? You should have put in place a framework for security that lives besides your development lifecycle, like Security Development Lifecycle (SDL) from Microsoft for example. Microsoft SDL is a pretty good resource to start with while keeping the learning loop active during the development process.

Dimension One: Keeping everybody involved.

Let’s assume that a minor security issue appears during implementation of some feature. One of the developers finds a possible flaw. She may go ahead and resolve it, consider it as part of her job, and never tell anyone about it. After all, she has already been trained to do it.

Well… no!

Why would you ask, right!? This looks counterintuitive, especially because “build expertise inside the team and train the team in security” was one of the “dimension zero”’s to go with advice.

Primarily because that is how you start losing the homogeneity you got when tackling Dimension Zero. Furthermore, there will always be poles of security expertise, especially in large teams, you want to have the best expertise when solving a security issue.

Dimension Two: Technical

Here’s a funny fact: we can’t take the developers out of the equation. No matter how hard we try. Security training for developers must include a lot of technical details, and you must never forget about:

  • Basics of secure coding. 
    (E.g. never do stack/buffer overflows, understand privilege separation, sandboxing, cryptography, and …unfortunately many more topics)
  • Know your platform. Always stay connected with the security aspects of the platform you are developing on and for.
    (E.g. if you are a .NET developer, always know its vulnerabilities)
  • Know the security aspects of your environment.
    (E.g. if you develop a web application, you should be no stranger of XSRF)

This list can go forever, but the important aspect is never to forget about the technical knowledge that the developers need to be expsosed on.

Dimension Three: Don’t freak out.

You will conclude that you cannot have a secure solution within the budget you have. This can happen multiple times during a project’s development. That is usually a sign that you got the threat model wrong. Probably you assumed an omnipresent and omnipotent attacker. [We all know you can’t protect from the “Chupacabra”, so you shouldn’t pay a home visit.]

This kind of an attacker doesn’t exist… yet. So, don’t worry too much about it, focus on the critical aspects that need to be secured, and you’ll restore the balance with the budget in no time.

Instead of a sum up of the 4 security dimensions of software development, I wish you happy secure coding and leave you a short-but-important reading list:

Be safe!

Pragmatic steps for cybersecurity consolidation

At the end of last year, I had some time to review and get up-to-date with some of the most important security incidents of 2018. Some of these incidents are wide-spread knowledge, some of them are particular to the activity that I do. While doing this, I figured that I could draw some pragmatic conclusions about what basic protection is against “a generic 2018 cybersecurity threat”. I have great friends and colleagues, and so one thing leads to another and we get to publish a small eBook on this topic.

This small eBook is designed for decision makers to gain a high-level overview of topics, as well as for IT professionals responsible for security steps to be implemented.

All things considered, we hope that everyone who will read the eBook and will implement some recommendation to their current strategy / development / infrastructure / design / testing practices will improve their overall products’ or services’ security.

You can download it here. Of course, this is free. If you want to get it directly from me, drop me an e-mail please, I’ll make sure to reply with the proper attachment :).

I am the author, and my colleagues

Tudor Damian – Technical curator

Diana Tataran – General curator

Noemi Bokor – Visual Identity

Avaelgo – Sponsored some time to make this possible

Are the ones who made this possible.

Cheers to you to.

Singular Execution of Mission-Critical Operations

Something happened this month with Romania’s ING Bank. I’m sure you’re probably aware of it. They managed to execute a several (well, maybe more than just a several) transactions more than once.  Well, shit happens, I guess. They  have eventually fixed it. At least they say so. I choose believe them.

This unfortunate happening triggered a memory of  my first time working in a mission-critical environment where certain operations were supposed to be executed exactly, absolutely, only once. It was for a german company. back in 2013. I am not allowed to mention or make any refference to them or the project, so let’s anonymously call them Weltschmerz Inc. It went something like this (oversimplified diagram):

I don’t claim that ING’s systems can be oversimplified to this level, but for the sake of the argument, and the protection I assumed for the so-called Weltschmerz Inc. let’s go with the banking example.

Trusted actor is me, when using a payment instrument that allows me to innitiate a transaction. (can be me using my card, or me being authenticated in any of their systems)

Trusted application is the innitial endpoint where I place my input describing my transaction (can be a POS, can be an e-banking application, anything)

The Mission-Critical Operation is the magic. Somehow, the application (be it POS, e-banking, whatsoever) knows how to construct such a dangerous operation.

Trick is, that whoever handles the execution of this operation must do it exactly, absolutely, only once. If the trusted application has a bug /attack/misfortune and generates two consecutive identical operations, one of them will never get executed. If I make a dubious mistake and somehow am allowed to quickly press twice a button, or if the e-banking / POS undergoes an attack, the second operation will be invalid. If anyone tries to pull a replay attack, it will still not work.

How to tackle this? Well, there are alot of solutions for this problem. Most of them gravitate around cryptography and efficient searching, here’s the approach we took back then:

Digitally signing the operation: necesarry in order to obtain a trusted fingerprint of the operation. the perfect unique identifier of the operation.
I understand, it is not easy to accomodate a digital signature ecosystem inside your infrastructure, there’s a lot of trust, PKI + certificates, guns, doors, locks, bieurocracy and shit to handle. It is expensive, but that’s life, no other way around it unfortunately.

Storing and partitioning: this signed version is stored wherever. However its signed hash must be partitioned based on variable that derrive from the business itself. If we are to consider banking, and if we speculate, we could come up to: time of the operation, identified recipient, innitiator, requested value, actual value, soo many more possibilities….  This partition is needed because, well, theory and practice tells us that  “unicity has no value unless confined” If you are a very young developer, keep that in mind, it will cut you some slack later in your life.

Storing this hash uniquely inside a partition is easy now, it is ultimately just a carefull comparrison of the hashes inside a partition and the new operation which is a candidate for execution.

Hint: be carefull in including time in your partition. Time should not only be a part inside the signed operation, but also a separate, synchronised, independent, clock. I’m sure you already know this.

If you do this partitioning and time handling by the book, no replay attack will ever work.

Execution: Goes in all partitions that have something inside of them, gets the operations, does the magic. Magic does not include deleting the operation hash in the partition afterwards. It includes some other magic maker. I choosed my words carefully here :). #ACID.

There’s a lot more to it: 

  • signed hashes should be considered highly sensitive secrets, tough an encryption mechanism must be employed. Key management in this case is an issue. That’s why you will probably need an HSM or some sort of simmilar vault for the keys, and key derivates
  • choose your algorithms carefully. If you have no real expertise in cryptography, please call someone that does. Never assume anything here unless you really know how to validate your assumptions
  • maintaining such an infrastructure comes with a cost. It’s not such a deal breaker, but it is to be considered.

Again, I am not claiming that ING Romania did anything less than the best in order to ensure the singular execution, this article is not related directly to them. It is just a kind reminder, that it is possible to design such a mission-critical environment, for singular execution of certain operations.

As for my experience, it was not in banking, but rather a more open environment. #Marine, #Navigation.

Cheers to us all.

Avoidable privacy happenings

Last time, I tried to brief some of the steps you need to cover before starting to choose tools that will help you achieve compliance. Let’s dig a little deeper into that by using some real life negative examples that I ran into.

Case: The insufficiently authenticated channel.

Disclosure disclaimer: following examples are real. I have chosen to anonymize the data about the bank in this article, although I have no obligation whatsoever to do so. I could disclose the full information to you per request.

At one point, I received an e-mail from a bank in my inbox. I was not, am not, and hopefully will not be a client of that particular bank. Ever. The e-mail seemed (from the subject line) to inform me about some new prices of the services the bank provided. It was not marked as spam, and so it intrigued me. I ran some checks (traces, headers, signatures, specific backtracking magic), got to the conclusion that it is not spam, so I opened it. Surprise, it was directly addressed to me, my full name appeared somewhere inside. Oh’ and of course thanking ME that I chose to be their client. Well. Here’s a snippet (it is in Romanian, but you’ll get it):

Of course I complained to the bank. I was asking then to inform me how they’ve got my personal data, asking them to delete it, and so on. Boring.

About four+ months later (not even close to a compliant time) a response popped up:

Let me brief it for you: It said that I am a client of the bank, that I have a current account opened, where the account was opened. Oh but that is not all. They have also given me a copy of the original contract I supposedly signed. And a copy of the personal data processing document that I also signed and provided to them. Will the full blown personal data. I mean full blown: name, national id numbers, personal address etc. One problem tough: That data was not mine, it was some other guy’s data that had one additional middle name. And thus, a miracle data leak was born. It is small, but it can grow if you nurture it right…

What went wrong?

Well, in short, the guy filled in my e-mail address and nobody checked it, not him, not the bank, nobody. You imagine the rest.

Here’s what I am wondering.

  1. Now, in the 21st century, is it so hard to authenticate a channel of communication with a person? it difficult to implement a solution for e-mail confirmation based on some contract id? Is it really? We could do it for you, bank. Really. We’ll make it integrated with whatever systems you have. Just please, do it yourselves or ask for some help.
  2. Obviously privacy was 100% absent from the process of answering my complaint. Even though I made a privacy complaint 🙂 Is privacy totally absent from all your processes?

In the end, this is a great example of poor legislative compliance, with zero security involved, I mean ZERO security. They have some poor legal compliance: there is a separate document asking for personal data and asking for permission to process it. The document was held, and it was accessible (ok, it was too accessible). They have answered my complaint even though it was not in a timely compliant manner, and I had not received any justification for the delay.

Conclusions?

  1. Have a good privacy program. A global one.
  2. Have exquisite security. OK, not exquisite, but have some information security in place.
  3. When you choose tools, make sure they can support your privacy program.
  4. Don’t be afraid to customize the process, or the tools. Me (and, to be honest, anybody in the business) could easily give you a quote for an authentication / authorization solution of your communication channels with any type of client.

I am sure you can already see for yourself how this is useful in the context of choosing tools that will help you organize your conference event, and still maintain its privacy compliance.

Is your conference event GDPR compliant? – Part 2

Last time, I have briefed some of the main points that need review before thinking about turning your event GDPR compliant, and also mentioned that in doing so, you will obtain, as a happy byproduct, a nice fingerprint of your event.

Now, as a side note, and as you probably have already figured out, this series of articles is not necessarily addressing those environments that already have a data governance framework in place. If this is your case, I am sure you already have the procedure and tools in place. This series may become interesting for you when we get to talk about some specific tools, information security topics and some disaster scenarios.

There are still some grounds to cover regarding this topic, so let’s go!

Most probably, your main focus in the beginning is: let’s cover some the costs using sponsors, and let’s fire that registration & call for content procedures right away. Now, let’s not just rush into that. In order for you to collect data from participants and speakers (in short), you must have a legal basis for doing that. The legal basis for doing the processing – in this case just collecting it – may not be much of a choice, even though it seems so. In our experience, given the specific of our activity, you may have as a choice: consent, and fulfillment of a contract. Probably you will want to have a homogenous legal basis for all of your participants. Let’s assume the consent as legal basis for processing.

Consent

In order to be provided with consent, you are obligated to notify to the person offering consent several pieces of information:

[…]

Recipients of the personal data
Intention to transfer data to a third country or international organization
Storage Period, or criteria used to determine it.
How is automated decision making present in processing?

[…]

Just to name a few. I will not detail the full challenges of what a consent should be here, because this may become boring to you. You may know all this already. After all, you are already in this business J

Several of these topics are easy to pin-point if you went to the process detailed in the first article of the series. (e.g. identifying the recipients of the personal data). Still, some of the topics did not derive from that first process.

Establishing Data-Flow and assessing the tools

In order for you to be able to answer some questions like:

Are these data going to travel outside EU? Where exactly?

Are we going to profile anybody, or do some automated decision making?

you first need to define a data-flow associated with personal data, and even more, start thinking about the tools you are going to use.

Remember, in my first article, I have talked about the need to think about some third party software that may help you with some of your activities? Where does this software maintain its data? Is it outside EU? Can you control this?

You see where I am going with this: formalizing the data-flow, knowing what tools touch your data is of uttermost importance before even asking anybody for consent.

Don’t panic! These are anyway things you needed to do for your event, now, you just need to do them earlier. And if you ask me, just at the proper moment in order to benefit at the maximum from them. You do not want to start thinking about what tools you need when you already have 300 attendees registered by phone. That would be a bummer.

Next time, I am going to take a deeper look into tools and some basic security requirements that we recommend! Be safe!

Is your conference event GDPR compliant? – Part 1

I’m starting a series of articles in which I will try to cover my experience in managing privacy and GDPR compliance for several IT related conference events that are handled by “Avaelgo”. During this journey, I will also touch some in-depth security aspects, so stay tuned for that.

As I am sure you know already, a conference is a place where people gather, get informed, do networking (business or personal), have fun, and who knows what other stuff they may be doing. The key aspect here is that for such a conference to be successful, you need to have a fair amount of people being part of it. And since people are persons, well, that also means a fair amount of personal data.

There’s a lot to cover, but we’ll start with the basics. If this is the first time you are organizing such a conference, then you already have a head start: you don’t have to change anything. If not, then you must start by reviewing the processes that you already have in place.

In this first article I’m just going to cover what are the key points that you should review. Let’s go:

  1. How do people get to know about your event?

It is very important to know how exactly you are going to market your event. The marketing step is very important, and itself must be compliant with the regulation. This is a slightly separate topic, but it cannot be overlooked.

It does not matter that you will market yourself to participants, speakers, or companies. Personal data is still going to be involved.

  1. How are people going to register for your event?

This means: how are you going to collect data regarding the participants? Is there going be a website that allows registration? Do you allow registration by phone? There are still more questions to answer, but you have an idea about the baseline. These decisions will have a later impact on the security measures you need to take in order to secure those channels

  1. How are speakers going to onboard your event?

Same situation as above, but it may be that there is a different set of tools for a different workflow.

  1. How are you going to verify the identity of the participants?

Is someone going to be manually verifying attendance and compare ID card names with a list? Is there going to be a tool? Is there a backup plan?

  1. Do you handle housing / travelling for speakers / participants?

If yes, you will probably need to transfer some data to some hotels / airlines / taxies, etc…

  1. Do you have sponsors? Do they require some privilege regarding the data of the participants?

This is a big one, as I am sure you know, some or all of the entities that collaborate on your conference will require some perks back from your conference. It may be that they are interested in recruitment activities, or marketing activities, or some other kind of activities on the personal data of your participants. Trade carefully, everything must be transparent.

  1. Will you get external help?

Companies / volunteers / software tools and services that will help you with different aspects of organizing the event? What are they going to do for you? If they touch personal data, it is kind of important to know before you give it away to them.

  1. Are there going to be promotions / contests?

Usually, these will be threated separately and onboarding to this kind of activities will be handled separately, but still, it is a good idea to know beforehand if you intend to do this.

  1. As you can already imagine, this is not all, but we will anyway cover each topic from here in future articles, and then, probably, extend with some more.

This may look freaky and like a lot of work, but it really is not. Anyway, by trying to tackle personal privacy beforehand, you also get, as a happy byproduct, a cool fingerprint of what you need to do in order to have a successful event. Cheers to that!

A future article will come soon, covering the next steps. I am sure you already have an intuition of what those are.

See you soon!