Published: 2014-05-16 12:50:10 by Daniele Ricci

Many users have been wondering about security in Kontalk. In this post I'll consider the security concerns I faced and I'm still facing during the development of Kontalk.

Phone numbers

The biggest security implication is phone numbers. Hiding phone numbers to server administrators and to other users is very difficult since they are used as mean of identification. Kontalk uses hashes of the phone numbers in the user ID (something like a02ee628305f0ab754510b2a6c283f63db1cb965@kontalk.net, which is the SHA-1 hash of +15555245554).
Using decent hardware, by knowing just the country code, you can calculate all of the hash space and find the original value in a relatively short time. It's an easy task to carry out, but you'll have to spend time on that, meaning you have a target.
The hard part comes when the Kontalk devteam is not involved in server administration: despite we have very strong values and we do not give our data to others (of course we are tied to jurisdiction), other server administrators might. That's why we decided to create some sort of democratic network with an internal voting system, used to allow new servers and ban rogue ones. This system is not yet in place (we are still in alpha), but it will be ready at the right time. Further details will follow soon.

SMS provider

Another concern for phone numbers is using a 3rd party SMS provider to verify the numbers. In our case, Nexmo.
Of course it's a company and they have their privacy policy, but the problem is a Kontalk server must know the real phone number to send the verification SMS in the first place, meaning our Nexmo account has a record of all phone numbers that have registered or have tried to register to Kontalk. We really can't do nothing about it. Although we access Nexmo logs only when a user has a problem during registration, in fact we do have the phone numbers of all of our users.

Encryption: end-to-end without OTR

Encryption is another very important security concern. The older 2.2 versions had a weak encryption method which is being replaced by OpenPGP encryption in version 3.0. This new method will do a simple public key encryption using OpenPGP standards. More features such as perfect forward secrecy and deniable encryption will be addressed with a future version having an OTR-like approach.

Trusting server administrators

Last but not least, there is a concern about server administrators. Kontalk is designed to be a community network, meaning that volounteers can rent servers and make them available as Kontalk nodes. Those nodes will have access to all presence data and (unencrypted) messages passing through the server.
A server can prevent login attempts from users registered from another server though: it checks if the server public key is signed by the server that is authenticating the user in that moment. This is the way a server shows "trust" in one another. If anything happens, a revocation of the signature and that server is no longer trusted.
Anyway this doesn't prevent a rogue server from creating forged identities or fake accounts. There are mechanisms to protect users against that, but only when such abuses are discovered — that's because if an abuse is sporadic, it's hard to uncover it.

Still, there's no formal agreement between servers. They all count on mutual trust and spoken deals. This matter will be addressed when Kontalk will grow enough to justify the creation of a nonprofit organization, with an organized team and a more defined path to follow.

Published: 2012-11-27 18:50:13 by Daniele Ricci

Kontalk is a collaborative network: it relies on multiple servers working together as a cluster. From a storage point of view, it's not a simple mirroring cluster; a Kontalk network can handle failover and load balance.

Failover

This is very simple: if a client can't connect to server A, it will simply connect to server B. It will be able to use its credentials with server B because of the recognized signature by server A. But what about user-related data? What if a user vCard is stored on server A, B wants to retrieve it but A is not available?

Load balance

This is also simple for client connections: if server A has too many connections, client will go to server B. Storage would be the same: if everyone upload files to server A, users will connect only to server A to download them. This way, server B will host no files.

Solution 1: replication by caching

All servers are pretty much always connected with each other in order to accomplish several tasks, such as forwarding messages, probing presence. Even when broadcasting presence, the net component will connect to every server for sending the presence stanza — because the router would broadcast to every route.
This solution is quite simple: whenever a change occurs, the server will broadcast it to the network. Of course there could be many optimizations to that, including:

  • cache expiration
  • broadcast only to some servers, based on user presence and subscriptions

Whenever a server needs information, it can always request it manually. The main con here is server unavailability: server A will not be able to obtain user info only stored on server B.

Solution 2: IRC-like full replication

On server join, a full presence situation and all user data are sent to the joining server by another one. This is an immense waste of resources, but I need to investigate if it can bring some benefits.

Solution 3: distributed hash table

This is actually my dream storage: real partitioned distribution. A DHT is ideally the perfect storage — at least on paper — because of the distributed nature of the Kontalk network. If DHT communication is secured with strong encryption and protected by access privileges, it could bring huge benefits, not to mention the reliability it could have.
But, as all things, it has its cons: synchronization. A presence system needs to be synchronized to the second. Even if with solution 1 we might have synchronization issues too, a DHT is a far more complex beast to handle.

Conclusion

Kontalk needs to store various kinds of data:

  • presence information
  • vCard data
  • privacy lists
  • media files metadata

I guess the solution I'll use will be #1, because it's relatively easy to implement and debug. I will try to optimize it to extreme, but, of course, I will fall into DHT eventually.