Scaling with Redis

As you already know, Redis is lightning fast in-memory, key - value database. When we say lightning fast, we mean it. Even though Redis is single threaded, a few hundred thousand requests can be handled per second, with response time less than 1ms. There are also other uses for Redis. For example, it can be used as a message broker, to distribute the data to multiple servers.

There are three ways you can utilize Redis in your projects. The first one is as a cache, which can save quite a lot of time spent on database queries. The second is to use it as a data store. In this case, the data is saved in Redis as if it would be in an in-process memory – for example, a static list. And the last possible case is to use Redis as a message broker and distribute (small) data, using publish/subscribe. The use cases, and our usages of them are described below with a bit more details.

Redis as a cache

First, what is a cache? Simply put, cache is a set of data we use most commonly, stored in a way so it is easily accessible, and with low response times. For example, next time you open StackOverflow website, most of the files your browser needs are already cached, so the load is blazing fast. Even if it is not, the files will be cached after the first visit.

We can do CRUD operations on the data. However, the most common implementation does only reading and storing of the data, without updating or deleting it. This is because the data usually expires after some time if it is not being accessed – which is great, as only the data which is frequently accessed should actually be in the cache.

The most common way of caching, with Redis or some other store, is to create a Cache service. This service is used directly in the repository, which should be the only place in your code where you access your database. When reading the data from the database, a request is sent to the cache and to try and read the data from there. If it exists, well, we just saved ourselves a query, plus round trip time. And more important, we saved the customer of frustration. Did you know that customers expect your website to load in under 3 seconds? Now you do. If the data however, does not exist in the cache, it must be fetched by running a query against the database. The fetched data should be now stored to the cache for further use. Also, it is a good practice to set some expiration time, so when the server does not have enough memory, some of the data is deleted. The process is similar for non-query commands – instead of just writing the data to the persistent data source, it is also being written to the cache.

You can do more with the cache. It is sometimes overlooked, but some of the queries that use quite a lot of server CPU cycles are the aggregate ones. For example, displaying a number of comments on a blog post requires a scan through the database, and can be almost as expensive as actually fetching all those comments. However, with a simple counter, which is incremented each time a new comment is added, this number can be retrieved within a millisecond. Now, imagine we need to sort the blog posts by the number of comments – a frequent feature on blogs. For cases like this, Redis has SORT command. In this case, even more time can be saved. We feel like the cache is a common and one of the most understood principles. We are a lot more excited to share about our use of Redis as a data store.

Scaling with Redis

Redis as a data store

When using Redis as a data store, we need to code as if we are accessing a local in-process memory variable. The reason to actually use Redis, instead of in-process memory is, we can access Redis with multiple servers. This way, there is only one copy of the data, and it should always be up to date.

Here are a few problems we have had, and how we solved them using Redis. Before diving into the problems and solutions, it might be worth mentioning what libraries we use. For this specific case, the server is written in C#, and the library used for access is StackExchange.Redis, which is available on NuGet. Redis is hosted on a linux server, with version greater than v4.0 (more info on why this specific version below).

INGProblem 1

We needed to store a dictionary, where the values are arbitrary types. Think of a generic ConcurrentDictionary<string, T>. A few methods were needed and implemented, e.g. Count, IsEmpty, GetOrAdd, AddOrUpdate, etc.

INGSolution 1

To make this one work, multiple data structures are used. The first one is a list. This list contains all the keys that are currently in the dictionary. By using the command LLEN on the key where the list is saved, we can create Count and IsEmpty methods. Let’s say that the list is saved under the key ‘MyDictionary:keys’.

The second struct is a HashSet. Each entry is serialized to a HashSet, and saved under some key. This key is created as ‘MyDictionary:key:<specific key>’, and then the <specific key> is saved in the list of all keys, so we can explicitly keep track of the items saved.

Create and Delete operations need to add/delete the HashSet, and update the list. Updates and Reads need to work only with the HashSets.

However, this way, we are updating multiple Redis keys, and if something bad happens, e.g. web server crash, we could be left with inconsistent data. Therefore, some protection is needed from the concurrency issues. There is a simple solution for both issues.

Did you know you can use Lua script?

For some use cases, as this one, we can use Lua script, which is sent to Redis and executed at once, to avoid locking (which is very bad for concurrency), and to make sure that all the commands are sent to the server at once. Since Redis is single threaded, no commands can be executed while the Lua script execution is in progress. This solves a few concurrency issues that would exist if the commands are sent one by one from the web server, because, some other server might modify the same data between the commands. In case the web server fails, the script is already sent to Redis and will be executed.

This comes in handy for methods like AddOrGet which need to be atomic. As mentioned before, Redis is single threaded. Executing Lua script allows us to check and set or get the values from Redis with a single request, so no concurrency issues happen here.

However, there is one single trade of for this specific method and a few others. In the case when the data already exists in Redis, we want only to fetch it, but nonetheless, the data that might be saved is serialized. The reason for this is, if the data does not exist, the serialization still needs to be performed, and to actually save the data, another request is needed. Also, it makes us work harder to keep the data stored to minimum, and to serialize only the values that are really needed, so time taken for serialization does not have a significant impact.

One more thing – for simplicity, we said the keys used are like this: ‘MyDictionary:keys’ and ‘MyDictionary:key:<specific key>’, which is not true. The actual keys we use are somewhat more like ‘Dictionary:{<Dictionary key>}:keys’ and ‘Dictionary:{<Dictionary key>}:key:<specific key>’. This way, multiple dictionaries can be instantiated, as long as their key is unique.

Scaling with Redis

INGProblem 2

There are a few simple variables (think of integers) that needed to be shared across multiple servers, but the variables are updated many times per second. These variables are sent to the user of the application, and the user should see the most recent data possible.

INGSolution 2

To understand this problem better, imagine a few servers, and each has many connected users via web sockets. The users try to fetch some of the data all the time, for example, 5000 times per second in total. Time needed to access the variable stored in Redis is less than 1ms, including Round Trip Time (RTT) on a LAN, but for the sake of simplicity, let’s assume it is 1ms. This way, 5 servers are required to only query the data from Redis, which is, as you might have noticed, not optimized at all. On top of that, these variables are updated a few times per second.

Here we have a really simple and elegant solution. As mentioned before, Lua script is your friend. Also, publish/subscribe is a very handy utility for this specific problem. We use Lua to update the variable, and to immediately publish the new value, with a single request to Redis. All the web servers are subscribed and when the data is changed, all the servers are notified. The web servers keep a local copy of the data, which is a read only, and send this data to the users.

This way, thousands of queries per second are not sent, and the internal traffic is much smaller, since the data is sent from Redis to the web servers only when an update actually happened.

Keep in mind that, when starting a new server, you still need to get the data from Redis, for the local copy.

Another important thing is, to update the values, or do some calculations based on these values, you have to move part of your logic to the server using Lua script, or to lock the data.

Scaling with Redis

INGProblem 3

The last type of problems we are going to describe in this post. Some of the data stored in Redis needs to be processed frequently with very optimized algorithm. Lua script is great, but it doesn’t cut it in this specific case. It is quite limited, and a finer level of control was needed.

INGSolution 3

And, it is time to introduce Redis Modules. These modules are written in C, so the level of control is incredible. Redis has a set of methods you can call, to get parameters, create responses, save data or execute any logic you may want. We are not going to describe how to set up modules or work with them – you can read that here: https://redis.io/topics/modules-intro.

There is not much to say about modules without giving info on a concrete implementations – which we won’t do here, as it reflects some of the business logic.

Note: This is the reason Redis v4.0 or greater is required – the support for modules does not exist.

Persistence

Redis node can save it’s data to the file system using two methods:

  • Snapshotting – the whole data set is dumped in a single file,

  • AOF – only the changes are written to the end of an Append Only File.

Each has it’s pros and cons – here is the summary. Snapshotting creates smaller files, but take more time to create the snapshot. Loading times are also faster. AOF creates substantially larger files, but the speed of creating them is much better.

The usual setup includes a hybrid version. Snapshot is created from time to time, and AOF is used in between.

Replication

It is always a good idea to use replication and keep multiple copies of the data. Redis supports master-slave architecture. The master node accepts writes, and sends the commands that lead to modification of the data to all the slave nodes. Slave nodes do not accept writes by default, but this setting can be changed (and it makes sense to do so in some cases).

When one of the slave nodes fails, nothing is changed. The data still exists on the master and other slave nodes. When it comes back online, it will sync with the master automatically.

If however, a master node fails, after a configurable time-out, if the failed node does not start, a slave node is promoted to a new master.

All this is handled by Redis Sentinel, which is installed alongside each Redis node, both, master and slave.

One of the common ways to setup replication along with persistence is to disable persistence on the master node, and all other slave nodes except for one. This one should write data to the file frequently, and can serve read requests if needed.

When setting up the replication, you should be careful. It is possible to delete the whole data set by accident if the persistence is disabled on the master node. When the master fails, and tries to start up, it will start with a clean state, and this new state will be propagated to all the slave nodes, effectively deleting all your data.

Also, make sure that all the nodes fail independently, so the cluster can recover.

Data partitioning

Data partitioning is saving the parts of a large data set on multiple nodes. Each of the nodes have a unique subset. If replication is paired with data partitioning, each of the nodes should be a master node of the replicated set. There benefits of data partitioning are improved performance, as well as utilizing many computers at once, and therefore, having more memory than a single computer can handle.

The core issue is determining node where the data should be saved or read from. There are a few ways to do this, but two are most common:

  • Client side partitioning – the node is determined by the application,

  • Proxy partitioning – a proxy is used which determines the node.

In both cases, the determination is based on the key. In the first case, the application needs to know the addresses of all available instances, which is not something we can recommend, as it feels tightly coupled. For example, let’s say there are 4 nodes. The application should calculate the node number based on the key, by using a pure function, so the same key always goes to the same node.

For the second case, the commands are sent to the proxy which then determines the node and passes on the commands. For example, Twemproxy allows uses of curly brackets in the keys, and the content between them is taken in account when determining the node. This is a recommended solution, which plays nicely with our implementations. The previously described dictionary has keys ‘Dictionary:{<Dictionary key>}:keys’ and ‘Dictionary:{<Dictionary key>}:key:<specific key>’, so only <Dictionary key> determines the node, and all the data for a single dictionary goes to the same node. This makes it much easier to debug and reason about.

When using data partitioning along with replication, a single server can host multiple Redis nodes, but each of the node should belong to different replication cluster. If the server fails for whatever reason, the clusters will recover.

Scaling with Redis

Tips & tricks

There are a few things to keep in mind when using Redis as a cache and data store.

  • The server should have enough memory required, for best performance. It is a good idea to set expiration time for each key. Accessing the key resets the expiration time.

  • Redis saves state to a hard disk when using the default configuration. Changing the configuration, or completely disabling this feature can have significant positive impacts on performance.

  • It is better to use multiple different Redis instances than to save all your data to a single one. This is especially true if you have a great server with multi core CPU. The maximum instances should be #CPU cores minus one - which should remain free for the OS.

  • Replication is your friend. Keep the replicated instances on separate servers.

  • Use batching to send multiple commands at once.

  • Avoid locking – it kills scalability.

  • Handle cases like bad connection in your application.

  • Avoid using time as a variable – clock sync can happen.

  • Use DI and IoC when injecting Cache service into the repository, and always check if the cache service is null before using it. This way, you can easily swap from one provider to another, or not use cache at all.

That’s it! We have described a few problems, one for each category we have met. There are a few more implemented data structs, but they fall into the described categories, for example a generic Queue. If you are interested to know more, or simply discuss ideas with us, get in touch! We love tech and sharing and gaining more knowledge.