Backend & DevOps
What is REDIS and what is its use in application architecture
In my case, this June was marked by interesting events. I spoke on two online events: DSC Online talks and Techband Meetup, where I talked about REDIS. You can also look forward to the second part where you will learn more about security or what use it has with us.
What is a Redis?
Redis is a storage of data structures of the NoSQL type, which stores data in the operational memory, ie operations on this storage are performed over the RAM memory, which is characterized by high speed.
NoSQL means that it is not a relational database, the data stored in Redis do not have to have a prescribed / predefined structure.
When we are talking about key-value storage, that means that we write data to it and also access the data using the key to which the value is assigned. The value can be any data structure with a maximum size of 512 MB per record. Other important information about REDIS is that it is single-thread and can store 2 ^ 32 records.
What can you do with REDIS?
There are many, several hundred functions that can be run on REDIS. In many cases, this is a basic function, e.g. for storing the PUSH record and several of its modifications LPUSH (prepend), RPUSH (append) above the sheet, etc. Functions are divided into several categories:
Key operations: DEL - delete record by key, EXISTS - verify whether the searched key exists, SCAN - cursor-base iteration records, TTL - obtain information about the validity of the record and EXPIRE - set the validity of the record.
Geohash operations - they are very interesting, if you encounter a more complex problem, where you will need to solve the search by location, or display a large amount of data on the map, I definitely recommend looking at Geohash. Redis offers functions directly implemented for generating geohash, calculating distances, etc.
Place in the architecture of the apps
This application consists of a Frontend, which can be e.g. v React, Angular or mobile application, ie any consumer API that is implemented e.g. in Node.js or in Java and SQL databases. During operation, we expect frequent calls to the endpoint to the list of game days, which we select from the database and return to the client. With each call, we enter the database, select all the necessary data, format it and return it. During a hockey match, however, several thousand people watch the match through our application, the client application regularly updates the data and in one second the request for the list comes 500 times. This will start to affect the performance of the application, the time for which the server can process the request will increase, and in the worst case, the service will become unavailable, it will start to timeout. How can Redis help us in this case? One of the most common uses of Redis is caching. On the server, next to the application backend and the application (or on a separate server, which is more appropriate), we run the Redis server.
We will then modify the application so that after the request for the list arrives, it will first look at Redis to see if the list is not there. If so, we will return it in response to the request. If not, we select the data from the database, format it to the form required in the response, save it to Redis and return the response to the client application.
It is important to set the validity of the stored record in Redis, e.g. for 5 seconds. What have we gained from this effort? If we have an average of 500 requests for the detail of a given match per second, we will enter the database in one minute instead of 30 times only 12 times. The remaining requests will be handled from a cached record in Redis, where retrieving from Redis is a significantly cheaper and faster operation than accessing disk records.
Recording from DSC Online Talks (Slovak):
When is good to use Redis?
We have now described one of the possible cases of using Redis as a cache, other very useful ways Redis can help us are:
You have definitely met queues or you will meet them most projects. They can be used not only in data processing management, and this is where Redis can make our work easier. Redis has directly implemented PUSH and POP functions and their various modifications, including very useful queue blocking functions when accessing multiple clients.
Leaderboard. This case may seem very specific, but indeed Redis is often used for the purpose of maintaining a ranking, whether of competitors or another set of records, which need to be arranged according to the score and work with it. Redis contains a set of functions for working with rankings, eg
ZADD - save the competitor's score together with the possibility to use the parameter to determine whether the existing record should be upgraded or ignored, even depending on the score,
ZREVRANGE - pull rankings in the range defined by indexes
ZRANK - returns competitor score
We also used these functions in our Fantasy League project, where Redis, in addition to Cache, is also used to store and work with the list of competitors.
Counting anything you can think of. Redis is very useful for collecting statistics, and also contains a set of Increment / Decrement functions, which are often used in relational databases.
When it is not appropriate to use Redis or what to consider before you decide to use Redis on your project.
Asynchronous disk writing
Redis, in its default configuration is just a in-memory storage. Persistency of data on the disk is possible, but it's necessary to turn on this function. You also need to know, that even after you turn persistency on writing on the disk is asynchronous, so in case when the server fails you can lose data, which haven't been written to the disk yet.
All data is in RAM
In case you want to use Redis as a primary data storage, it's necessary to consider expected amount of the data. It's a big advantage, that all of the data are stored in a RAM, but from capacity and cost point of view you need to be cautious.
The service is available, but unavailable when the server fails. The same problem can occur with other databases or non-HA systems, so it is not a specific shortcoming in the case of Redis, I wanted to point out that in the case of Redis this problem is partially solvable through Redis Cluster, where another functional node can represent dropped node, but there will be a separate part about the Cluster.
As working with Redis is specific in many respects, it is necessary to think about the functions and features of an application when designing (I don't mean UX/UI design) if it is to be usable as a primary store. E.g. with SQL databases, the database system is often easier to replace with another relative.
What mistakes should be avoided when using REDIS?
Certainly one of the biggest problems I would mention is insufficient security, not using a password to access and opening Redis to the world. We will talk more about security in a separate section, but I wanted to mention it here as well, as it is very important.
In most cases, the KEYS command is shown first in the tutorials. However, KEYS is a synchronous display and, depending on the number of records, can take a long time to load the server, as it cannot handle further requests while executing this command. However, it is mostly used only for debug development purposes.
Numbered databases (SELECT)
In Redise, you can work on and switch between multiple databases, but these databases are not very well designed. They are insufficiently insulated from each other. If you run the KEYS command on one database and it contains too many records, you will shut down all databases. The workload per instance is therefore shared. The authors of REDIS themselves identified them as one of the biggest flaws in the design of REDIS.
New connection for each request
Properly, the connection should be kept open and used for multiple commands, instead of opening and closing a new connection for each command. Especially with PHP, with (supposedly) once a problem, keep one open connection.
Hot Key problem
In the Redis cluster, if we have one key, which we access very often, we basically access one node very often, because the given record is located on one node. This problem must be solved already in the application design, so that such a situation does not occur.