Nxgxl blog

Sunday, December 5, 2010

data protection in the cloud

We are told that the future of internet computing is "the cloud".  What this means is that organisations and individuals will just share applications and data without the need to run their own web servers to run the applications or store the data.

We are also told that it would be a good idea to centralise all data about each individual inside the cloud that is the internet so that individuals could conveniently alter things like address or mobile number without having to log on to dozens of web sites.  From the public and private sector point of view it is also convenient as it provides a one stop shop for data on an individual they are doing business with.

The thing is that I might be happy to share some information with one organisation but not with others.  I might also like my data excluded from searches without my authorisation.  How can I get this assurance?  I know that I have to log on to change my data but how do I know what controls operate over search engine access or access to my data by organisations I have not entered into an agreement with.

The first problem is the cloud itself.  It is virtual and hence in no one place or country particularly when you take into account the backup and mirroring arrangements of the organisation managing the pool of personal data.  It is common practice to backup data remotely and in the case of the cloud it might make sense for this to be in the timezone opposite to mine.  It is most unlikely that the legal constraints on access will be universally enforceable worldwide.

Now what exactly happens when I log on?  

The system looks up my logon identifier in a database and then checks that the password matches the one given.  Assuming that the data store is accessed via https rather than just http then the password will have safely passed from the individual to the store as clear text but in the case of https the whole interchange is rendered private with encryption so it would take a determined attacker who captured the entire interaction to find out the password.  You may have noticed the increased use of captcha fields where you are shown letters and numbers jumbled up with other shapes.  These are designed to be readable by a human but not a PC thereby reducing the chance of someone setting up a robot to try a whole list of passwords for a given logon identifier.

Let's assume the password is secure.  The next step is getting data to and from me which raises the question of how the data is stored.  The whole point is that the data is shared with those whom I authorise so the data cannot be encrypted using the encrypted version of my password.  If the data storage operator encrypts all data using a key only they know then that would enable the operator (assuming only they know the key) to send, receive and store the personal information. But....
Why should I trust the data storage operator?  

This is a very good question which  raises a second use of encryption.  There is a sort of encryption whereby I keep one private key and share the other.  When I encrypt some data with the private key it can only be decrypted with the shared key.  The data operator would associate my shared key with my identifier and then share the data I authorise with the organisations I authorise by decrypting the data with my shared key and the encrypting it with the shared key of the recipient so the recipient can only get back the data with their private key.

This raises the question of where the keys come from.  I should not trust anyone else to generate the pair of keys because then they will know both and the whole system of protection is compromised.  Pretty Good Privacy had a way round this involving running a program to generate the keys on a stand alone PC disconnected from the internet.  The mechanism involved typing garbage into a window on screen until there was enough random data to use as the basis for generating a key pair.  Given the the process involves recording the timing between key presses as well as the garbage typed it would be hard for the same individual to generate the same key pair twice let alone someone pretending to be that person.  Once the key pair is generated you then secure the private key as you wish and send the shared key to the data store operator.

Encryption is one tool to help secure personal data in the cloud but it is only part of the security policy the data operator should be expected to follow.  The operator should implement physical access controls over their system,  they should vet their employees, they should use the controls in the operating system and applications that they use to secure the data store.  The whole system should then be audited by independent experts every 6 months and then certified secure.

What has been described is a solution to central storage as envisaged in Mydex

I might prefer a very different model which is to store my personal data onto my own USB stick and share by allowing an application on the USB stick to fill in forms for me.  This is the solution used by Roboform.  It is up to me to back up and to secure my USB stick but if I keep it unplugged when I do not want to share anything the data is simply not there to be hijacked.

Of course I have to make sure all the systems I fill in forms for are secured by https before I submit the data and I would need to establish a trust in the recipient organisation to secure the data I submit.

In both the central storage and USB stick examples there is a final and difficult problem.  When I give an organisation access to personal data I am trusting them to secure it whether it came from me directly or via a central store.  That trust ought to be based on the results of regular independent security audits like those required for the central store.

Search engines would see only what is stored unencrypted but I need to be able to trust authorised users of my data to encrypt it and hence the security audits.

The organisations that provide your internet connection (ISPs) can gather all the traffic on any account they choose but if the communications involving transfer of secure data are protected by SSL they will not be able to look inside the private packets of data unless assisted by organisations with massive computing power who would be able to do the decryption in time.

So ..... there are lots of issues to resolve before entering into data sharing in "the cloud"

No comments:

Post a Comment