Hello! My name is Joe, I’m a developer. Here I will talk about a recent case that saw us implementing an atypical solution to ensure secure data processing. In addition, I will tell how much time it took to choose the optimal architectural solution, and why we settled on a difficult-to-implement encryption method.
Data leakage of 139 mln users from Canva, the leak of personal information of 143 mln users in Equifax, US’ one of the largest credit bureaus, the loss of data of more than 8 million users of food delivery services and the average increase of the number of leaks in the financial sector by 1.7 times in 2022 – should I continue? All of these seem to be good enough reasons to think about security. Speaking about applications for financial accounting, the issue of data safety is especially acute.
A little about the application itself – this is a financial accounting system for small and medium businesses. It allows you to manage sales, expenses, plan future payments and analyze the financial condition of the company using dashboards.
My team was faced with the task of making data access secure, protecting it as much as possible and minimizing losses in case of information leakage. When designing the application, great emphasis was placed on data security.
External influence is a security issue that is not only dealt with by the backend, but also by the frontend. A leak can occur at different levels: data transfer, backend, frontend, base. The basic solution from the front is authentication, the implementation of two-factor protection and additional complications. As you understood from the beginning of the article, we did not stop there.
I must say right away that this solution is suitable not only for the financial sector. It is up to you where to apply it.
How can user data be influenced from the outside?
Our task is to make sure that a hypothetical data leak does not affect the user in any way – does not bring him losses and does not endanger him. To do this, we need to understand how the system can be affected. Here I list only the main vulnerabilities:
- Cross-site scripting, XSS;
- Cross-site request forgery, CSRF;
- Bruteforce attack.
Vulnerabilities evolve – there are too many things we want to protect against. Every year OWASP publishes the top 10 vulnerabilities, so I advise you to update the information and know what or how can affect the system.
You can’t hide from everything, but we can try to nullify the consequences.
It is believed that data is safe when the costs for an attacker to obtain it do not justify the potential earnings. Therefore, our goal is not to protect data from leakage by 100%, but to make it as difficult as possible to access it.
How to ensure the security of user data?
And here we come to encryption, so that even with a possible leak of the database, the user’s data remains safe, since it is encrypted.
There are two types of encryption: symmetric and asymmetric. In symmetric, we have a public key for encryption and decryption, and in asymmetric, we have different ones, that is, public and private.
Option I. Store data in the database in encrypted form
We close the data at the database level. If the database is stolen, data without encryption keys cannot be accessed.
If the user needs the data, we prepare (decrypt) it on the server and transfer it in finished form for display to the user. All calculations and analyzes necessary for financial accounting are performed on the server.
Thus, we use server-side encryption, that is, we encrypt and decrypt data each time using a key that is stored on the server.
Not suitable.
Doesn’t solve the leak problem. The data is open in individual cross-points of our circuit. In addition, if a user’s session is stolen, then encryption keys can be stolen too. Theft of encryption keys = data theft. It turns out that we protected ourselves from the leakage of the entire database, but not from the impact of an attacker on a certain user.
Option II. Store data in the database in encrypted form and close access at the server level
Close access to data even more. Now we do not aggregate them on the backend, instead do everything on the front. We want to close the circuit at the front, meaning server leaks are not a big deal.
Let’s hide behind the backs of cool security guards. We close the system as much as we can, bind in a bunch of analyzers and try to nip any problem in the bud. We initialize the encryption keys on the front side, and close the data. We also encrypt the key.
It turns out that we use client-side encryption, that is, we prevent data leakage on a closed circuit.
The good news is that the hacker will now suffocate in our system. Bad news – the user will do too.
Not suitable.
Solves the problem of leakage. But we’re talking about a financial accounting application: big data, computing inside. Giving all the calculations to the front side is fundamentally wrong from the point of view of the client-server architecture. In addition, we will increase the time for processing and downloading data to the user.
You can try to choose the most efficient calculation algorithm, but it still won’t be enough for calculations. In addition, devices (phones, tablets, etc.) have different capacities.
Option III. Storage, transfer and calculations in encrypted form
We completely close the data from the system. Only the user has access to open data. All calculations take place on the server. We can use mathematical methods on top of encrypted data – homomorphic encryption.
Homomorphic encryption is a form of asymmetric encryption that allows you to perform mathematical operations on data in encrypted form. The specific implementation of this method is Paillier cryptosystem.
Suitable.
Solves the problem of leakage. Losses in the speed of calculations are minimized due to the fact that the calculations are left on the server side.
Let’s move on to a specific implementation
Application architecture:
- React/Redux, TypeScript;
- Interaction with encryption through RxJS;
- Encryption – asymmetric for strings and homomorphic for numbers.
The process of processing and transferring data in the application:
- Let’s create a private area. A private area will be considered a personal account of an organization. Access to it requires authorization and encryption keys
import { generateRandomKeysSync } from 'paillier-bigint';
import { pkcs5, pki, md, util, cipher } from 'node-forge';
const {
publicKey: publicKeyPailler,
privateKey: privateKeyPailler,
} = generateRandomKeysSync(PAILLER_SIZE_BYTES);
const {
privateKey: privateKeyStr,
publicKey: publicKeyStr,
} = pki.rsa.generateKeyPair(RSA_SIZE_BYTES);
Two keys are needed to work – one for homomorphic, the other for asymmetric RSA encryption. When initializing this area, the user enters a password, which is the superuser password.
For storage on the server, the RSA private key and Pailler private key will be combined into a public encrypted key, and will be encrypted with the user/organization password.
/**
* combine private key parts
*/
const privateKeyParts = [privateKeyStrPem, privateKeyInt].join(
CRYPTO_KEY_DELIMITER,
);
/**
* encrypt the combined private keys with a password
*/
const cryptoKey = this.encryptCryptoKey(privateKeyParts, password);
/**
* data to save to the server:
* encrypted private parts of encryption keys and open public ones
*/
const cipherData: CipherKeys = {
cryptoKey,
publicKeyInt,
publicKeyStr: publicKeyStrPem,
};
import { generateRandomKeysSync } from 'paillier-bigint';
import { pkcs5, pki, md, util, cipher } from 'node-forge';
/**
* @description encrypting cryptoKey using AES
* @param cryptoKey delimited private keys for RSA
* @param password – password that is encrypts cryptoKey
*/
private encryptCryptoKey(cryptoKey: string, password: string): string {
const salt = util.decode64(AES_SALT);
const iv = util.decode64(AES_INITIALIZATION_VECTOR);
const aesCipherKey = pkcs5.pbkdf2(password, salt, AES_ITERATIONS, 16);
const aesCipher = cipher.createCipher('AES-CBC', aesCipherKey);
aesCipher.start({ iv });
aesCipher.update(util.createBuffer(cryptoKey, 'utf8'));
aesCipher.finish();
const encryptedCryptoKey = util.encode64(aesCipher.output.getBytes());
return encryptedCryptoKey;
}
Symmetric encryption collects data sequentially. Initially, we get the summarized crypto key, and encrypt it using a symmetric encryption algorithm. We need a hash of the password that already exists. We will get the hash using the Password Based Key Derivation Function – this is a more difficult hashing to implement, so that it would be almost impossible to find the hash by brute force.
/**
* AES salt: n bytes - base64 encoded
*/
const salt = util.decode64(AES_SALT);
/**
* AES initialization vector: m bytes - base64 encoded
*/
const iv = util.decode64(AES_INITIALIZATION_VECTOR);
/**
* generate a hash for encryption based on the password
*/
const aesCipherKey = pkcs5.pbkdf2(password, salt, AES_ITERATIONS, 16);
Encrypt private keys. After getting the hash, we encrypt the private parts of the encryption keys with symmetric encryption so that we can store it on the server. We use Cipher Block Chaining: we split the private key into packets of the same size. Each data packet is encrypted with a hashed password. The first byte starts with an initialization vector. In the future, each data block will be interconnected and undergo some connected mathematical operation depending on the previous calculation.
const aesCipher = cipher.createCipher('AES-CBC', aesCipherKey);
aesCipher.start({ iv });
aesCipher.update(util.createBuffer(cryptoKey, 'utf8'));
aesCipher.finish();
const encryptedCryptoKey = util.encode64(aesCipher.output.getBytes());
- Sending encrypted data. The user sends the private parts of the public keys in encrypted form and the public parts in the open one. Since public parts are only encrypted, their loss for security will not be a big deal. What happens to them next? The user has a private key, which is hardwired into the Execution Context in JS. When sending any data, we encrypt it on the client side if the data is small. We can send large data as a stream to the server and encrypt it on the server. So we will have almost no effect on the main thread and performance.
const encryptSalesHandler = (saleRequest: ICreateSaleData): ICreateSaleData => {
const total = cipherClient.encryptNumber(saleRequest.total) || 0;
const comment = cipherClient.encryptString(saleRequest.comment || '');
return {
...saleRequest,
total,
comment,
};
};
public encryptNumber(num: number | string): string | undefined {
const encryptedNumber = this.publicKeyPaillier?.encrypt(
BigInt(Number(num)),
);
if (!encryptedNumber) {
throw Error(NumberEncryptionError);
}
return encryptedNumber.toString();
}
We tried to isolate the service and the client from data loss as much as possible, and if this does happen, the data still remains encrypted. The system is complex enough to crack all layers of security.
What problems did we face during the implementation?
- Checking for unique values. The names of accounts and dictionaries are stored encrypted. We cannot compare two encrypted values because there will be different results.
Solution: store hash for unique values and compare by hashes.
public encryptString(str: string, params?: { withHash?: boolean }): string {
if (!this.publicKeyRSA) {
throw Error(EmptyPublicRSAError);
}
const encryptedStr = this.publicKeyRSA.encrypt(
util.encodeUtf8(str),
'RSA-OAEP',
{
md: md.sha1.create(),
mgf1: {
md: md.sha1.create(),
},
},
);
const encodedStr = util.encode64(encryptedStr);
if (params && params.withHash) {
const hashString = this.hashString(str);
return `${encodedStr}${CRYPTO_HASH_STRING_DELIMITER}${hashString}`;
}
return encodedStr;
}
- Encryption appeals to integers. Our implementation of homomorphic value encryption uses BigInt which only accepts integer values.
Solution: fractional parts are reduced to integers – it is determined the digit for the round off. For example, accounting goes up to the second number after the decimal point. To preserve the possibility of operating with fractional parts, we multiply all numbers by 10 to the power of n before encryption. The power of 10 is equal to the max digit of the fractional number. After decryption, we divide by the same number.
export const prepareNumberToCipher = (value: number, power: number) => {
return value * (10 * power);
};
export const prepareNumberToDecipher = (value: number, power: number) => {
return value * (10 * power);
};
What was taken into account when working with encryption?
1.How to aggregate and receive large amounts of data? The homomorphic encryption algorithm allows this data to be aggregated. We get an array of data that we can sum. This amount for us will be of the kind that we want. However, if we need to get a large amount of data, then the client will experience problems with fast decryption. The user will feel the difference.
In order not to block the main thread by encryption processes, we leveled up the Web Worker – multithreading at the JS level. Decryption occurs in a separate thread. It does not affect other data and the interaction between it.
2.1. How to give the user convenient access to encrypted data? Each time a user refreshes the page, it will be necessary to enter the password.
- It is necessary to store keys between sessions. We cannot store them in clear text, so we will create a temporary password on the server that we give to the client and use as a password to encrypt the keys. The client will cache already encrypted keys. When updating the session, we regularly update the temporary password and re-encrypt the key.
- Caching security issues.
- The standard for storing sensitive data is cookies, however in our concept they will violate the security policy because the server will have a temporary password and a private key encrypted with this password – looks like an obvious security vulnerability;
- local storage – vulnerability to XSS.
We will store everything in Local Storage as conceptually, we cannot fully protect ourselves. For each session, we make a binding of accessToken, refreshToken, and temporary password. When the user refreshes the session, we can determine the previous temporary password, decrypt the private keys from the cache, and encrypt them with the new temporary password.
In order not to enter the organization each time with the start of a new session, private encryption keys are stored privately in the cache. When updating the session, we get access to private keys from the cache without the need to enter the organization’s password.
2.2. What to do when the key is lost? When creating an organization, we assign a superuser. It can manage users – add, delete, issue passwords, etc. If necessary, the superuser can re-encrypt the data, and update the key for the user.
3. How to upload a large amount of data to the server? For example, a user wants to upload a large excel file with a big “database” to our system where this information will be safe. The server has public keys, so it is possible to encrypt a large amount of data on the server side.
CONCLUSION
Let’s return to the original goal: to provide the most secure data storage in a financial accounting application for small and medium businesses.
Conceptually, it is impossible to completely protect data from leakage, because there are too many vulnerabilities, and they are constantly evolving. So, our task has been transformed from protecting data from leakage by 100% to making a hypothetical leak harmless to the user. So, if the data is stolen, it will be securely encrypted.
We tried to isolate the server and client layer from data loss as much as possible. And if the data is still lost, then they are directly inaccessible. We can assume that the system is complex enough to pass all layers of security. Thus, it is difficult and costly for a hacker to hack this system.
If we use the concepts of homomorphic encryption, then it is quite difficult to find a convenient and supported implementation and use it. You will have to look for a solution on your own, explore the algorithm itself and follow the support yourself. When decrypting data, we noticed a slight decrease of performance. We tried to pull the data in the Web Worker but the speed of decrypting large data is not significantly accelerated.
To create such an encryption scheme, we used the following algorithms:
- RSA is cryptographic algorithm with a public key based on the computational complexity of the large integer factorization problem;
- AES is a symmetric block cipher that operates in blocks of 128 bits;
- Paillier cryposystem is an additive homomorphic cryptosystem, that is, knowing only the public key and the ciphertexts corresponding to the plaintexts m1 and m2, we can calculate the plaintext of the ciphertext m1+m2 .
For organizing architecture – RxJS, Web Worker.