Kontalk is a collaborative network: it relies on multiple servers working together as a cluster. From a storage point of view, it's not a simple mirroring cluster; a Kontalk network can handle failover and load balance.
This is very simple: if a client can't connect to server A, it will simply connect to server B. It will be able to use its credentials with server B because of the recognized signature by server A. But what about user-related data? What if a user vCard is stored on server A, B wants to retrieve it but A is not available?
This is also simple for client connections: if server A has too many connections, client will go to server B. Storage would be the same: if everyone upload files to server A, users will connect only to server A to download them. This way, server B will host no files.
Solution 1: replication by caching
All servers are pretty much always connected with each other in order to accomplish several tasks, such as forwarding messages, probing presence. Even when broadcasting presence, the net component will connect to every server for sending the presence stanza — because the router would broadcast to every route.
This solution is quite simple: whenever a change occurs, the server will broadcast it to the network. Of course there could be many optimizations to that, including:
- cache expiration
- broadcast only to some servers, based on user presence and subscriptions
Whenever a server needs information, it can always request it manually. The main con here is server unavailability: server A will not be able to obtain user info only stored on server B.
Solution 2: IRC-like full replication
On server join, a full presence situation and all user data are sent to the joining server by another one. This is an immense waste of resources, but I need to investigate if it can bring some benefits.
Solution 3: distributed hash table
This is actually my dream storage: real partitioned distribution. A DHT is ideally the perfect storage — at least on paper — because of the distributed nature of the Kontalk network. If DHT communication is secured with strong encryption and protected by access privileges, it could bring huge benefits, not to mention the reliability it could have.
But, as all things, it has its cons: synchronization. A presence system needs to be synchronized to the second. Even if with solution 1 we might have synchronization issues too, a DHT is a far more complex beast to handle.
Kontalk needs to store various kinds of data:
- presence information
- vCard data
- privacy lists
- media files metadata
I guess the solution I'll use will be #1, because it's relatively easy to implement and debug. I will try to optimize it to extreme, but, of course, I will fall into DHT eventually.