RedisGears and Write-behind caching strategy

RedisGears and Write-behind caching strategy

what is RedisGears?

RedisGears is a module to Redis but what is Redis module? Redis modules are dynamic libraries that can add new commands to Redis, so Redisgears is a module that add new commands to Redis. but this is very limited definition, redisgears is much more than that.

in the documentation we will see that the definition of redisgears is:

RedisGears is a dynamic framework for data processing in Redis. RedisGears supports transaction, batch and event-driven processing of Redis data. To use RedisGears, you write functions that describe how your data should be processed. You then submit this code to your Redis deployment for remote execution.

How redisgears compoenents looks like ?

    +---------------------------------------------------------------------+
    | Redis Server               +--------------------------------------+ |
    |                            | RedisGears Module                    | |
    | +----------------+         |                                      | |
    | | Data           | Input   | +------------+ +-------------------+ | |
    | |                +-------->+ | Function   | | APIs              | | |
    | | Key1 : Value1  |         | | +--------+ | | C, Python, ...    | | |
    | | Key2 : Value2  | Output  | | | Reader | | +-------------------+ | |
    | | Key3 : Value3  <---------+ | +---+----+ | +-------------------+ | |
    | |      ...       |         | |     v      | | Redis commands    | | |
    | +----------------+         | | +---+----+ | | Gears admin & ops | | |
    |                            | | | Step 1 | | +-------------------+ | |
    |                            | | +---+----+ | +-------------------+ | |
    | +----------------+         | |     v      | | Coordinator       | | |
    | | Events         |         | | +---+----+ | | Cluster MapReduce | | |
    | |                | Trigger | | | Step 2 | | +-------------------+ | |
    | | Data update    +-------->+ | +---+----+ | +-------------------+ | |
    | | Stream message |         | |     v      | | Engine            | | |
    | | Time interval  |         | |    ...     | | Runtime execution | | |
    | |      ...       |         | +------------+ +-------------------+ | |
    | +----------------+         +--------------------------------------+ |
    +---------------------------------------------------------------------+

How it work beside redis server?

RedisGears runs within the same process as Redis itself, but it operates independently of the main Redis event loop. It doesn't execute on the same thread as the Redis event loop.

RedisGears uses its own execution engine, separate from the main Redis event loop. When you execute RedisGears functions or operations, they run in a dedicated RedisGears thread or threads, depending on the configuration and workload.

How to install redisgears ?

Run docker container with redisgears module installed on it.

docker run -p 6379:6379 redislabs/redisgears:latest

then to access redis server run this command

redis-cli

we focus on this article in Batch processing and Event-driven processing of Redis data.

Batch processing of Redis data

Batch processing is the processing of data in batches. A batch is a group of data items that are processed together. Batch processing is useful when you want to process a large amount of data in a single operation using python .

Ex 1

127.0.0.1:6379> RG.PYEXECUTE "GearsBuilder().run()"

we use RG.PYEXECUTE command to execute python code in redis server, and we use GearsBuilder() to build gears object and run() to run gears object.

The GB() (GearsBuilder) object in RedisGears offers various options, also known as readers, that allow you to interact with different data sources and execute operations. Here are some common options:

  • "CommandReader":

    Reads keys from the Redis database and triggers the execution.

  • "KeysReader":

    Reads specific keys or patterns from the Redis database.

  • "StreamReader":

    Reads streams (Redis data structure) and processes their entries.

  • "ShardsIDReader":

    Reads data from specific shards in a clustered environment.

  • "PythonReader":

    Allows executing Python functions directly as a data source.

  • "ExecutionReader":

    Reads data from the output of other executions.

  • "PythonSubInterpreterReader":

    Reads from Python sub-interpreters.

  • "StreamReaderWithFilter":

    Reads streams with filtering capabilities.

if you don't pass any reader to GearsBuilder , it assume that you choose all readers.

run() : This indicates that the function should be run immediately on the data in the Redis database

EX 2

127.0.0.1:6379> RG.PYEXECUTE "GearsBuilder().filter(lambda x: int(x['value']['age']) > 35).foreach(lambda x: execute('del', x['key'])).run('user:*')"

we use filter() to filter data and foreach() to execute command on each data and run() to run gears object on specific data.

  • It is run on all keys that match the user:* pattern.

  • The script then filters out all keys that have the age hash field lower than (or equal to) 35.

  • It then runs all remaining keys through a function that calls DEL on them (i.e., the keys are deleted).

  • Finally, it returns both key names and key values to the client.

you can also write python code in file and execute it in redis server.

# script.py
import numpy as np

def hash2list(redis_key):
  h = redis_key['value'] # redis_key contains 'key' and 'value'
  return [float(h['x']), float(h['y']), float(h['z'])]

def do_mean(acc, x):
  if acc is None:
    return x
  return np.mean([acc, x], axis=0)

GearsBuilder()\
.map(hash2list)\
.accumulate(do_mean)\
.flatmap(lambda x: x.tolist())\
.run("vec:*")

this script calculate the mean of vectors in redis server.

127.0.0.1:6379>hset vec:1 x 1 y 2 z 3
127.0.0.1:6379>hset vec:2 x 2 y 3 z 4
> cat script.py | redis-cli -x RG.PYEXECUTE
> 1) 1) "2"
     2) "3"

Event Processing of Redis data

Event processing is the processing of data in response to a trigger. A trigger is an event that causes the execution of a function. Event processing is useful when you want to process data in response to an event.

to tell redis server to execute gears object in response to an event we use register() function.

EX 3

# maxage.py
def age(x):
  ''' Extracts the age from a person's record '''
  return int(x['value']['age'])

def compare_and_swap(x):
  ''' Checks and sets the current maximum '''
  k = 'age:maximum'
  v = execute('GET', k)   # read key's current value
  v = int(v) if v else 0  # initialize to 0 if None
  if x > v:               # if a new maximum found
    execute('SET', k, x)  # set key to new value

# Event handling function registration
gb = GearsBuilder()
gb.map(age) # Extract the 'age' field from each hash
gb.foreach(compare_and_swap) # Compare the max age to the value stored at age:maximum
gb.register('person:*') # Only process keys matching the pattern 'person:*'

this script update the maximum age in redis server after each update in person data.

redis-cli RG.PYEXECUTE "`cat maxage.py`"
127.0.0.1:6379> HSET person:5 name "Marek Michalski" age 17
(integer) 2
127.0.0.1:6379> HSET person:6 name "Noya Beit" age 21
(integer) 2

after each update in person data the maximum age will be updated.

127.0.0.1:6379>GET age:maximum
"21"

Write behind cache using redisgears

Write-behind caching is a caching pattern where data is written to the cache and the database at the same time. This pattern is useful when you want to write data to the cache and the database at the same time, this done by every time you write data to redis server there are event trigger to write data to database.

now we can do this after know how make python script in redis server and how to register it to execute in response to event.

first install gears-cli

gear-cli is a command line interface for RedisGears. It allows you to execute RedisGears functions and operations from the command line.

pip install gears-cli

second write python script to write data to database

from rgsync import RGWriteBehind, RGWriteThrough
from rgsync.Connectors import MySqlConnector, MySqlConnection
'''
rgsync is a python library that provides a set of functions and classes to simplify the development of RedisGears functions and operations, but we can use redisgears without it
'''

'''
Create MySQL connection object
with "test" database
'''
connection = MySqlConnection('root', 'root', '127.0.0.1:3306/test')

'''
Create MySQL persons connector
we will write in "persons" table and "person_id" is primary key
'''
personsConnector = MySqlConnector(connection, 'persons', 'person_id')

'''
Create mappings between Redis keys and MySQL columns
'''
personsMappings = {
    'first_name':'first',
    'last_name':'last',
    'age':'age'
}

RGWriteBehind(GB,  keysPrefix='person', mappings=personsMappings, connector=personsConnector, name='PersonsWriteBehind',  version='99.99.99')

To register the script to execute in response to an event

gears-cli --host <host> --port <post> --password <password> run example.py REQUIREMENTS rgsync PyMySQL cryptography
gear-cli run example.py REQUIREMENTS rgsync PyMySQL cryptography

REQUIREMENTS is a special argument that tells RedisGears to install the required Python packages before executing the script.

To test the script

127.0.0.1:6379> HSET person:1 first_name "John" last_name "Doe" age 42
(integer) 3

To check the database

mysql> SELECT * FROM persons;
+-----------+------------+-----------+------+
| person_id | first      | last      | age  |
+-----------+------------+-----------+------+
|         1 | John       | Doe       |   42 |
+-----------+------------+-----------+------+
1 row in set (0.00 sec)

if you delete data or update in the Redis server the script will update the database.

References

1- RedisGears Documentation

2 - RedisGears Python quick start

3 - RedisGears Github

4 - Introduction to RedisGears

5- Write Behind and Write Through

6 -"Magic lantern" -> ChatGPT