17.4 on the Richter scale

Well, iOS 17.4 happened.

There’s a lot of hype around the EU specific changes of the iOS core supporting multiple stores and multiple payment solutions. As interesting as that is, I don’t care too much about it. It’s old news since some years ago.

It’s another point that is interesting to me:

Carplay

Let’s review a few features of the new carplay:

  • climate controls piped through carplay
  • TPM
  • Charge monitoring and management for EVs
  • Some vehicle settings
  • Trip management

How would you rate the driver facing software in a current era, digital dash car? (EV, partial EV, or conventional) On a scale of 1 to 10?

I’d give a maximum of 3. Let me put it this way. If it is a 5 on usability, it’s a 0.5 in features, or the other way around. Ok, maybe 4 in a top-class 90k+ car. But, not more. Be honest. As proud as you are of your car 🙂

Once a tech giant in software starts touching the software in a car, things can change very quickly.

The car industry’s supply chain does not have the know-how to build driver facing software. This is not my opinion, unfortunately, it is fact. And listen, it is not their fault. That’s how things are set-up.

What software do they know how to build? Mission-critical. That’s all.

Who fills in the gap? Well, lately, the apple carplay release that came with iOS 17.4. A huge foot in the mouth for this exact industry.

I can only imagine the VAG cutting in half all the future feature plans for the digital dash and driver-facing software, and giving it to apple and google. For free.

If things go on this way, and they seem to do, the next crisis of software developer human resource is going to be fueled, at least in part, by the drastic reduction coming from the automotive industry’s useless branch of driver facing software. Yes, hate me for the speech.

I never saw the infamous “goodbye screen” of the new carplay. If this feature is still there, I see it as a nice subliminal message easter egg. It’s addressed to the next dying industry.

There will be other battles, in the realm of influence that are going to take place, but still, the ground is shifting, 17.4 on the Richter scale.

F**k!

Benchmarking the FFM

Sounds like a p0*n title. But I promise you it is not.

So I just ranted with amazement about some of the unexpected frontiers that are being broken, enterprise-wise, by FFMs together with RAG.

Endeavoring on a journey to adopt such a model for you, and integrating it in a RAG pattern is ultimately trivial. Strictly talking from a software engineering perspective.

However, from a data science perspective, you must be able to evaluate the result. How capable is your model in performing your scenario.

Evaluating LLM FFMs is a science in itself, but there are very relevant benchmarks that you can use in order to gauge any LLM. Let’s briefly explore a few, before focusing on how you could evaluate your RAG scenarioo (hint to the bolded ones).

  • MMLU (Massive Multitask Language Understanding)
    Generally used to identify a model’s blind spots. General cross-domain evaluation. Relevant evaluation in zero-shot, few-shot and “medprompt+” configs.
    Competititve threshold: medprompt+, > 90%
  • GSM8K
    Mathematical problem solving with training dataset. Multi-step mathematical reasoning benchmarking.
    Competitive threshold: zero-shot >95%
  • MATH
    Mathematical problem solving without training dataset. In exchange the MATH dataset can be used for training instead of evaluation. Or on a 1-shot configuration.
    Competitive threshold: zero-shot ~70%
  • HumanEval
    used for LLMs trained on code. Kind of the standard here.
    Competitive threshold: zero-shot >95%
  • BIG-bench (Beyond the Imitation Game Benchmark|
    Mining for future capabilities.
    Competitive threshold: few-shot + CoT ~ 90%
  • DROP (Discrete Reasoning Over Paragraphs)
    Currently 96k question for reference resolving in questions, to multiple input positions, with various operations over portions of the input positions. This benchmark measures the level of comprehensive understanding. it is split into a training set and a development set, making it ideal for evaluating a RAG capability.
    Competitive threshold: few-shot + CoT ~ 84%
  • HellaSwag
    Evaluation of generative capabilities for NLI problems. Human treshold is accepted at 95% for this one. TL;DR; if a HellaSwag benchmark scores 95% or more, then the generative capabilities of the model are human-like. This is what you want and nothing less.

I took the liberty to add some competitive thresholds, in case you need some orientation in this evolving landscape. Take these thresholds with a grain of salt. They are based on my experience and some research that has gone into this material. Nevertheless, there should be a red flag if you’re running a FFM benchmarked lower than these.

Back to the problem at hand, you r RAG setup can easily be evaluated with a combination of DROP benchmark and HellaSwag. HellaSwag should be as high as possible, and your DROP is able to measure how well your model can generate.

You can go an extra mile and take a look at the DROP dataset, and replace those paragraphs with paragraphs from your RAG scenario, and then run an benchmarking experiment. A little birdie told me that this is relevant if done correctly.

However., all the datasets, benchmarking algoritms (already implemented) are available with (various) open licenses. For example. you can find implementations and the datasets for ALL the benchmarks I have mentioned above at https://paperswithcode.com/

Happy new year!

RAG(e) Against the Machine

Formulating the latest LLM leaps as foundation models has opened a box of infinite possibilities. If you were living on the Earth for the past 24-36 months, this is not news.

The RAG Pattern now made its way into very niche areas.

But first, a little (his)story

Remember the era of the chatbots? Then the era of “synthetic chatbots”. You know, the ones that answered the phone when you wanted to solve a problem with your (bank / xSP)? Those are (or maybe were) just clever expert systems, covered by capable voice synthesizers. Yes, an expert system is still AI, the voice synthesizers are nowadays also built with a sort of generative AI model.

You know why they are still around ?

Because they make a difference. Dollar-wise.

Context

Foundational LLMs used with RAG quickly found their way into the mode technical aspects of human communications. Engineering that is. IT&C Engineering to be more precise.

For example, operation centers, including SOCs, very quickly adapted to this new reality and implemented RAG out of the box for second and third level support. Basically, what this means, in a nutshell, is that when you build a support team (second and third level) for a product, team members DO NOT have to spend time reading any type of written manuals. Zero.

Another really cool example is in cybersecurity. You can no have (and you do have) solutions in place that do “assume breach”-level monitoring, and you can query their status using natural language. This is achieved by indexing definitions of cybersecurity concepts together with the output of the smart monitoring tools. This is already pretty cool. But this is not the main subject of incursion.

The intrigue

I got into a discussion with one of my friends the other days. He is close to some content moderation ecosystems.

For a bit of a context, content moderation is a business where workers are subject to an EXTREME cern-rate. Look it up for yourself. The average will blow your mind.

Training employees (to use the tools), offering first level support for the software ecosystem serving the content moderators is an operation that has the second impact in cost for this business.

Well, they have eliminated the need to:

  • do technical training for any employee (new, old, whatever)
  • there is no first level support anymore

Why, because RAG can.

I’ve got some neat insights on how they’ve done it, but this is another subject.

This is big. This is crossing barrier. Completely eliminating first level support for an operation where this accounts for a lot of cost is big. Even if their audience is fairly technical, it is still a big achievement.

I can see a future where…

There will be an acceptance criteria for various vendors inside an enterprise ecosystem that your documentation must be capable of integration with their RAG solution because if not, the operational cost is 50% higher.

Hey, I’ve seen mission-critical workloads that have shit (the stinky kind) documentation.
Not once.

It’s not that this happens that intrigues me, but the speed at which this is happening. Or maybe I am too old already. It may have something to do with the fact that cloud providers already have the “RAG SDK” out and ready. Well, this is good news after all.

Peace.

The sharp tool

Cloud is long gone my friends, long gone. Cloud computing is now just a tool for a ubiquitous computing society.

Allow me to clarify: simply, there’s absolutely no aspect of our contemporary society that is possible without constant, continuous, invasive and maybe pervasive involvement of computing. The so-called “pervasive computing” (Eva Nieuwdorp). I’m not going to discuss anything about the so-called pervasive dimension of ubiquitous computing, but I am going to leave the concept printed here. It fits well with one of the points I am going to make here.

Ubiquitous computing is the reality – so, not only the concept – where computing appears anytime and everywhere, anywhere. Let’s let this sink in for a moment.

There are many technical moving factors that make ubiquitous computing possible, for example:

  • Hardware miniaturization
  • Hardware affordability
  • Software affordability
  • UI and UX leaps
  • Communication infrastructure affordability
  • (so many more)

If we are to go into psychological moving factors, we would totally open-up a whole different world, so, we are not going to do that.

The reality is that we find ourselves in a compute-omnipresent society for a long time. There is nothing really new about all of this.

The TNT that actually placed the true ubiquitous in ubiquitous computing was the dawn of cloud computing. The ability to have computing as a service, any computing. The ability to “order random computing”. The ability to to LEGO with computing units on a global playground. Once this ability was available to the masses everything changed. Actually, the plain analogy with LEGO is not 100% right. It is like you always had the LEGO pieces, and now, by some magic, you can order magic part, place it inside your LEGO structure and it suddenly turns to reality. Magic and dangerous at the same time. Like black magic dust.

At the beginning…

There was, and maybe there still is, a lot of hype around cloud computing. You know, being able to rent and integrate the best of the best of a finished product in terms of computing is always going to be hyped-up.

At the beginning there were a lot of arguments about the “as-a-service” versus “ownership and independence” of computing “stuff”.

At the beginning there were a lot of arguments around the reliability of “the cloud”

At the beginning there were a lot of arguments around the security of it all.

At the beginning there were a lot of arguments about the cloudonomics of it all. Economies of scale, they said.

And now…

Now, the ownership problem was resolved. “People” asked, and the “people” have obtained the possibility of getting everything in and out of the “public cyberspace” as they please. We can run today an app in the cloud, and tomorrow in your datacenter, together with data, traffic, whatever.

Now, the reliability got to, I think, almost the best you can have. Proven in war.

Now, the security can no longer take place outside the cloud.

The twist

The last standing post is the cloudonomics. The turn here is that businesses moving to the cloud understood that the cost factor has to be balanced by the possibilities that you now have, post adoption. Optimizing the cost has to be balanced with getting more value for your buck.


If your business doesen’t find a way to benefit out of the fact that it can now be globally available, you have failed a bit

If your business doesen’t find a way to benefit out of the fact that it can now fairly easy make use of outstanding technology (AI/ML, BI, data archiving, data handling, etc) you have failed a bit more

If your business doesen’t find a way to benefit out of the fact that it can now stay available / grow in an elastic manner, you have failed a bit more.

We could go on forever.

It is about how “your business in the cloud” should incorporate computing in general, and contribute with some weight in developing the ubiquitous computing. That’s the sweet spot. Your cloud adoption is 100% successful when you get more ubiquitous.

The sharp tool

Cloud computing is to ubiquitous computing what a hammer is to a nail. This is not a hardware vs. software analogy, this is not a hardware vs. hardware analogy, this is not an infrastructure vs. service analogy. It is just a tool analogy.

If you know you are going to be hit by a hammer, you’d better be sharp at the other end.

P.S. building sharp tools is going to be the subject of the next topics.

Cheers.

Ranting about surveillance

In Romania, there’s a lot of law being in either in debate, or passing passed about giving the authorities direct abilities to get all e-data. E-mails, IMs, mandating the vendors to hand unencrypted data out. Please read this again.

Now, I want to lightly share some of my experience in working for some top cybersecurity ‘consulting’ companies. (lightly, because of NDAs)

Now, the layout is as follows: justice, law enforcement, and ultimately governments are mandating surveillance when justified. Alright Good.
The surveillance is happening anyway (phone taps, physical tracking, e-mails, IMs, and whatnot) with help from various agencies that are specialized in doing that.
The level of expertise that various govt. agencies have in terms of electronic surveillance is not always up-to-date. This is to say that their capabilities are limited. This is normal. When they face a situation where they can’t pursue a surveillance task, they outsource. There are cybersecurity companies that offer such services. These companies have cybersecurity researchers that are on top with various 0Days and the corresponding exploits and they master this.

How do I know this? I was one of these guys that offered cybersecurity research services for such a company. Repeatedly. Actually only two times, for two different companies. So not a lot of experience here. Just enough. I’m not going back there!

So, what’s my problem?

Let me guide you through an example:

Assume there’s a mandate that asks for IMs sent by the suspect. This is currently achieved, usually, by compromising the user’s device (phone, laptop, PC, MAC, whathef**kever) with some malware that is usually designed by one of these contracted companies. Surveillance happnes ON THE COMPROMISED DEVICE ITSELF.

Surveillance does not happen on the ‘encrypted wire’, or on the IM vendor’s infrastructure, but on the TARGETED DEVICE.

Now, suppose this new law passes, that will mandate the IM service providers to hold unencrypted data (or hold encryption keys) FOR EVERYBODY, ‘just in case’ a mandate is thrown away.

Do you see the problem yet ?

Jesus Christ, we live in a f***ed up world !!!

Mid-workshop surprise!

Something weird and nice happened to me today.

Several times a year, I accept requests for guiding cybersecurity workshops for various clients. Usually they fall into the category of web application security, or software development security. Not more than several (max 5) times a year because this will greatly impact my performance in other areas.

So one client requested a web applications security workshop that must be focused on OWASP guidelines. It is awesome for me. I always like OWASPs content. Sometimes, I even have the privilege of contributing to it. When I provide this service, I never prepare exhaustive slides for presenting an already well established material, such as the one from OWASP. I just go on the website and work with that as a prequel to my deep examples.

So what happened? Mid-workshop, the OWASP Top 10 W.A.S.R. changed. Bam! “Surprise M**********R!” Deal with that!

Now, during these events, I usually bring a lot of my experience in addition to whatever support material we are using. Actually, this is why someone would require guidance in going through a well-established and very well built security material, such as the one from OWASP.
When I talk and debate, and learn together with an audience about cybersecurity topics, I always emphasize things that I consider to be insufficiently emphasized by the supporting material. I say emphasized and not detailed, and please be careful to consider this difference.

Insufficiently emphasized topics

Traditionally, OWASP’s guidelines and material did not emphasized enough, in my humble opinion:

  1. The importance of using correct cryptographic controls in the areas of: authentication and session management, sensitive data exposure, insufficient authorization
  2. Insecure design in the areas of: bad security configuration, injection problems, insecure deserialization
  3. Data integrity problems. Loop to #1.

I usually spend spend around 10-11 hours from 16[or more] hours workshop on the three topics above. Very important stuff, and, traditionally overlooked in most teams that I interact with.

What changed?

It was a nice surprise to see that in the new TOP 10 W.A.S.R. OWASP included my three pillars and emphasized concepts the same way I like to do it. They even renamed sections according to my preference. Like the second position (A2) is now called Cryptographic Failures. AWESOME!
They explain stuff in a more holistic manner, as opposed to just enumerating isolated vulnerabilities. AWESOME!
Finally. It was an extremely good argument for the team that I was leading, about the way I spent my time on the three topics. I felt good about them 🙂
Alraaaaight, I felt good about myself too!

Oh, and P.S: For the first time in.. what now, more than a decade (?!) OWASPs Top 10 W.A.S.R. does not have the top position occupied by injection problems. Either the web has grown exponentially again, or we have escaped a boundary. The boundary of absolute stupidity 🙂


Cheers!

Breaking through passports

A fairly long time ago, at blackhat (as far as I remember) somebody came on stage with an implementation of some proposed design of an electronic passport and proved its extremely worrying security flaws. I’ll leave the pleasure of researching the event to you.
Back then (maybe around 2001-2003?), the spirit of “blackhat” (or whatever event it was) was free, and if you were following that kind of security events you were consistently exposed to serious security researchers challenging the disastrous security implementations of either mainstream technology providers (such as Microsoft 😉 ) or of the govt digitalization.

What does this have to do with passports today? Well, they are a bit more secured thanks to that kind of involvement. Secured for both their users as well as for their governments.
Well, even if they were not so secured, the biggest looser of a flawed normal passport is its government. The projected user impact is fairly low. Identity theft by means of passport forgery or “torgery” is not a huge phenomenon.

Legitimate concerns for any passport users

From the user security perspective, let’s explore:

  • Who does the passport say that I am ?
  • How do I know I carry a valid (authentic and integral) passport at any time?
  • What does the document scanning say about me ?
    • e.g. it was scanned while passing through customs, ergo anybody that is able to see a record of that scan will know that I was there at a specific point in time

Extrapolating to COVID passes

Well. As far as the ethical concerns, I’ll leave those up to each and every one of you.

But as far as the security goes, I have a few very worrying concerns:

  1. The implementation details are not widely available.
  2. There are alot of security incidents with these documents. And they are not and cannot be hidden even in the mainstream media. (https://www.dw.com/en/security-flaws-uncovered-in-eu-vaccination-passport/a-58129016)
  3. How do I as an owner and user of the pass know WHO and when views or verifies my scans.

    I am not going to continue the list, it is already embarrassing

This last one is a bit concerning.
Here’s why: with a normal passport, once scanned, the person that gains or has access to a list of these scans knows that you went over some border sometime.
However, with COVID passes the same actor that has or gains access to your passport scans, knows where you are kind of right now. Remember, you’ll scan your pass even if you go to a restaurant.

What do I want?

As a COVID pass user I want to have the certainty that no passport scans are stored anywhere. And I do not want anybody except the scanner to be able to see my identity. Because the moment they do, they’ll know where I am and what I’m doing. And this is unacceptable. Mainly because the scanning frequency of such a pass can be daily for some of us.

So, can I get what I want ?

I have tried to get implementation details Both directly, by asking it, legally, and through the security community (both academic and professional). The result ? No result. The public debate(s) prior to implementing and adopting these passes were a complete joke.
How long until XXXX gets a hold of a list of my scans and then follows me around, or worse. Seems far fetched ? Take a look at the latest data leaks 😉

But as the philosophers Jagger and Richards once said: “You can’t always get what you want!”

( https://www.youtube.com/watch?v=Ef9QnZVpVd8&ab_channel=ABKCOVEVO )

Some concerning technical and ethical reviews

A lot, has been going on lately. So much so, that I do not even know how to start reviewing it.

I’ll just go ahead and speak about some technical projects and topics that I’ve been briefly involved in and that are giving me a fair amount of concern.

Issue number x: Citizen-facing services of nation states

A while back, I made a “prediction”: the digitalization of citizen facing services will be more present, especially as the pandemic situation is panning out. (here) and (here). I was right.
Well, to be completely honest, it was not really a prediction as I had two side (as a freelancer) projects that were involving exactly this. So I kind of had a small and limited view from inside.

Those projects ended, successfully delivered, and then came the opportunity for more. I kindly declined. Partly because I’m trying to raise a child with my wife, and there’s only so much time in the universe, and partly because I have deep ethical issues with what is happening.

I am not allowed to even mention anything remotely linked with the projects I’ve been involved in, but I will give you a parallel and thus unrelated example, hoping you connect the dots. Unrelated in terms of: I was not even remotely involved in the implementation of the example I’m bringing forward.

The example is: The Romanian STS (Service for Special Telecommunications) introduced the blockchain technology in the process of centralizing and counting citizen votes, in national or regional elections that are happening in Romania. You can read more about it here, and connect the dots for yourselves. You’ll also need to know a fair amount about the Romanian election law, but you’re smart people.

The Issue?

Flinging the blockchain concept to the people so that the people misunderstand it. Creating a more secure image that necessary. Creating a security illusion. Creating the illusion of decentralized control, while implementing the EXACT opposite. I’m not saying this is intentional, oh, no, it is just opportunistic: it happened because of the fast adoption.
Why? Blockchain is supposed to bring decentralization, and what it completely does in the STS implementation is the EXACT opposite: consolidate centralization.

While I have no link with what happened in Romania, I know for a fact that similar things shave happened elsewhere. This is bad.

I do not think that this is happening with any intention. I simply think there is A HUGE AMOUNT of opportunistic implementations going on SIMPLY because of the political pressure to satisfy the PR needs, and maybe, just maybe, give people the opportunity to simplify their lives. But the implementations are opportunistic, and from a security perspective, this is unacceptable!

Ethically

I think that while we, as a society, tend to focus on the ethics in using AI and whatnot, we are completely forgetting about ethics in terms of increased dependency of IT&C in general. I strongly believe that we have missed a link here. In the security landscape, this is going to cost us. Bigtime.

Blockchaining the energy

Several weeks ago I have finished migrating one of the past projects I have been involved in to the blockchain technology. Back in the day that I was leading that project, blockchain was not the default choice. So, “the guys” called and wanted me to come back and offer some insights while migrating everything to blockchain. Piece of cake. Plus, blockchain was a perfect fit for the job. A real pleasure, and not too much work to do. Easy money, how they call it.

However, this got me thinking. I have a weird passion lately in the area of decentralized energy grids, smart energy grids and grids in general. It was spiked by one of my friends. I haven’t had the chance to work on a serious energy project until now, but this didn’t stop me from fantasizing

What do I fantasize about?

The potential use cases of blockchain within the energy sector. So, here it is:

  1. Auditing and regulatory needs in terms of transparency. Obviously, here the native immutable records of a DL with the proper consensus in the network are the key.
  2. Data transfer problems whithin a smart grid. A smart grid is a very big deal: sensors, metering equipment, EMSs, building monitoring, etc. Theres alot of storage (DLs) and transfer in this environment, and it can all benefit from decentralized integrity. Let’s not even talk about introducing a new energy source in a smart grid, or extending a microgrid by commercial means.
  3. Commercial aspects in localized P2P energy trading. Local energy marketplaces I think is the official buzzword. This is too obvious.
  4. Billing. Well, I don’t need to explain this one, do I?
  5. More dynamic markets in general (not the P2P ones). A smart contract can help in switching providers with the speed of light. Now this is can be good nes even for centralized grids such as ours.
  6. Resource sharing in residential areas. With or W/O a microgrid infrastructure, residential resource sharing (Alternative energy producing infrastructure such as solar panels, EV Charging Stations, etc) can be shared in a more trustworthy environment if equimpents can make use of proper DLs

Now, in terms of the energy market & trading, OMG. I don’t have enough knowledge to even start to scratch that subject, but hey, that’s just another probable area of blockchain in energy.

P.S. the featured image is a representation of the “metil(1R,2R,3S,5S)-3-(benzoiloxi)-8-metil-8-azabiciclo[3.2.1]octane-2-carboxilat” aka “cocaine” molecule. It’s supposed to give you stimuli that you interpret as energy….or so I’ve heard.

Designing for critical privacy and data protection systems (I.5) – homomorphic secret sharing

Last time I have talked about some of the factors that influenced the evolution of privacy-preserving technologies.

I wanted to touch base with some of the technologies emerging from the impact of these factors and talk about some of the challenges that they come with.

After a discussion about e-differential privacy, I promised you a little discussion about homomorphic encryption.

There is a small detour that I find myself obligated to take. This is due to the latest circumstances of the SARS-CoV-2 outbreak: I want to split this discussion in two parts, and start with a little discussion about homomorphic secret sharing before I go into sharing my experience about adopting the homomorphic encryption.

What?! Why?

In the last article, I argued that one of the drivers for adopting new privacy mechanisms is: “The digitalization of the citizen facing services of nation-states. (stuff like e-voting, that I really advocate against)”

Well, sometime after the SARS-CoV-2 will be gone (a long time from today) I foresee a future where this kind of services will be more and more adopted. One of the areas where citizen facing services of nation-states will be digitalized is e-voting – e-voting within the parliament, for democratic elections, etc. I briefly mentioned last time that I am really against this. At least for now, given the status quo of the research in this area.

Let me explain a little bit the trouble with e-voting

Starting with a question: Why do you trust the people counting your vote?

[…annoying pause…]

A good answer here, could be:

  • Because all the parties having a stake in the elections have people counting. The counting is not done by a single ‘neutral’ authority.
  • Because, given the above statement, I can see my vote from the moment that I printed it, to the moment I cast it
  • Because your vote must be a secret, so that you cannot be blackmailed or paid to vote in a certain way – and there are some mechanisms for that
  • Because there is no – or little corruption in your country, and you don’t have a Dragnea (convicted for election fraud) pushing any buttons

You can see that in an electronic environment, this is hardly the case. Here, in an electronic environment, if you have a Dragnea, you are shot and buried. Here, in an electronic environment you:

  • Cannot see your vote since the moment you have printed (or pushed the button) to the moment of casting – anyone could see it
  • Cannot easily make sure that your vote is a secret. Once you act upon your vote and it is encrypted somehow, you have no way of knowing what you voted – it became a secret. So there is the trouble with that.
    Further more, assuming conventional encryption, there are master keys that can be easily compromised by an evil Dragnea.
  • Auditing such a system involves an extremely high and particular level ofexpertise, and any of the parties having a stake in the election would really have trouble finding people willing to take the risk of doing that for them. This is an extreme sensitive matter.

There is a research area concerned with tackling these issues. It is called “End-To-End Verifiable Voting Systems”.

End-To-End Verifiable Voting Systems

Basically, tackling these problems for e-voting systems means transforming an electronic environment for voting in such a manner that it can, at least handle the standards of non e-voting systems, and then add some specific electronic mumbo-jumbo to it, and make it available in a ‘pandemic environment’. [Oh my God, I’ve just said that, pandemic environment…]

The main transformation is: I, as a voter, must be able to act a secret vote up to casting it, and make sure my vote is accounted for, properly.

Homomorphic secret sharing

It would be wonderful if, while addressing the trust in the counting of the votes problem we would have a way of casting an encrypted vote, but still be able to count it even if it is encrypted. Well this can be done.

To my knowledge today, the most effective and advanced technology that can be used here is homomorphic encryption, and, more precise, a small subset of HE, called homomorphic secret sharing.

Homomorphic secret sharing is a secret sharing algorithm where the secret is encrypted using homomorphic encryption. In a nutshell homomorphic encryption is a type of encryption where you can do computations on the ciphertext – that is compute stuff directly on encrypted data, with no prior decryption. For example: in some HE schemes an encryption of a 5 plus an encryption of a 2 is an encryption of a 7. Hooray.

Bear in mind, the mathematics behind all this is pretty complex. I would not call it scary, but close enough.  However, there are smart people that are working on, and providing some, out-of-the-box libraries that software developers can use so that they can embed HE in their product. I would like to mention jus two here: Microsoft SEAL and PALISADE (backed by DARPA). Don’t get me wrong, today, you still have to know some mathematical tricks if you want to embed, HE in your software, but the really heavy part is done by these heroes that are providing these libraries.

Decentralized voting protocols using homomorphic secret sharing

In the next article I will talk about the challenges that you will face if you are trying to embed HE in your product, but until then, if you want to get a glimpse about the complexity, I will just go ahead and detail a decentralized voting protocol that uses homomorphic secret sharing.

  • Assume you have a simple vote (yes/no) – no overkill for now
  • Assume you have some authorities that will ‘count’ the votes. – number of authorities noted as A
  • Assume you have N voters
  1. Each authority will generate a public key. Anumber. Xa
  2. Each voter encodes his vote in a polynomial Pn, with the degree A-1 (number of authorities -1) and the constant term an encoding of the vote (for this case +1 for yes and -1 for no) all other coefs are random.
  3. Each voter computes the value of his polynomial  (Pn) – and thus his vote – at each authority public key Pn(Xa).
    1. K points are produced, they are pieces of the vote.
    1. Only if you know all the points you can figure out the Pn, and thus the vote. This is the decentralization part.
  4. Voter sends each authority the value computed using its key only
  5. Thus, each authority finds itself impossible to find how each user voted, as it does not have enough computed values – only has one.
  6. After all votes have been casted, each authority computes and publishes a sum (Sa) of the received values.
  7. Thus, a new polynomial is born (coefs are the Sa sums) with the constant term being the sum of all votes. If it is negative the result is yes, and vice versa.

If you had troubles following the secret sharing algorithm, don’t worry, you’re not alone. Here’s a helper illustration:

source: wiki on HSS

However, there are still problems:

  • Still, the voter cannot be sure that his/hers vote is properly casted
  • The authorities cannot be sure that a malicious voter did not computed his polynomial with a -100 constant, such that a single cast would count for 100 negative votes.
  • The homomorphic secret sharing does not even touch the other problems of voting systems, only the secrecy and the trust are tackled.

The challenges

See, you still have to know a little bit about polynomials and interpolation to be able to use this in your software.

The crazy part is that, in homomorphic encryption terms, homomorphic secret sharing is one of the simplest challenges.

Don’t worry though, in my next article I will show you some neat library (Microsoft SEAL), share my experience with you, and give you some tips and tricks for the moment when you will try to adopt this.

Until next time, remember: don’t take anything for granted.