Redis-cluster@Debian.md 20 KB

Redis Cluster install - local educative setup to remove and add nodes from and to the Redis Cluster.

installation

apt install redis

access CLI

redis-cli

cluster info

cluster nodes
ERR This instance has cluster support disabled

enable cluster mode by editing conf file search for cluster-enables

vi /etc/redis/redis.conf

uncomment

cluster-enabled yes
cluster-config-file nodes-6379.conf
cluster-node-timeout 15000
appendonly yes

restart in cluster mode

systemctl restart redis
systemctl status redis

should be

● redis-server.service - Advanced key-value store
[...]
             └─5017 /usr/bin/redis-server 127.0.0.1:6379 [cluster]

repeat cluster info

127.0.0.1:6379> cluster nodes
7021db16f7ce7a865912dabcbfd4573e4c501a3b :6379@16379 myself,master - 0 0 0 connected

stop the service to proceed with temporary testing

systemctl stop redis
cd
mkdir cluster-test
cd cluster-test
mkdir 7000 7001 7002 7003 7004 7005

create for each node specific config, changing directory name and port number

vi 7000/redis.conf
port 7000
cluster-enabled yes
cluster-config-file 7000/nodes.conf
cluster-node-timeout 5000
appendonly yes

once all config in place, start the nodes

cd 7000
redis-server redis.conf &
cd ../7001
redis-server redis.conf &
cd ../7002
redis-server redis.conf &
cd ../7003
redis-server redis.conf &
cd ../7004
redis-server redis.conf &
cd ../7005
redis-server redis.conf &
cd ..

check everything is running

ps aux  | grep redis
ss -ntap | grep redis

build a cluster

redis-cli --cluster create \
  127.0.0.1:7000 \
  127.0.0.1:7001 \
  127.0.0.1:7002 \
  127.0.0.1:7003 \
  127.0.0.1:7004 \
  127.0.0.1:7005 \
  --cluster-replicas 1

cluster will ask about some replicas on the same hosts as masters

Can I set the above configuration? (type 'yes' to accept): yes

connect to the first node and check cluster info

redis-cli -h 127.0.0.1 -p 7000

info

cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:137
cluster_stats_messages_pong_sent:136
cluster_stats_messages_sent:273
cluster_stats_messages_ping_received:131
cluster_stats_messages_pong_received:137
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:273

and cluster's node info

127.0.0.1:7000> CLUSTER NODES
7daacb5d3a77b49cad691257884450ff659fa269 127.0.0.1:7000@17000 myself,master - 0 1721895157000 1 connected 0-5460
0e8be45b617d91dd70c8f8b1ce7f1d88bfd70ca5 127.0.0.1:7003@17003 slave 1c11114d18a76ce7382ed31833121d9f68b6c849 0 1721895157502 2 connected
3efbc8c6870ac9eb6ec0050e1c87e650419f82b1 127.0.0.1:7005@17005 slave 7daacb5d3a77b49cad691257884450ff659fa269 0 1721895157916 1 connected
68994639150b6e85ad801b02bf19fc5b7b6975d5 127.0.0.1:7002@17002 master - 0 1721895156000 3 connected 10923-16383
1c11114d18a76ce7382ed31833121d9f68b6c849 127.0.0.1:7001@17001 master - 0 1721895157000 2 connected 5461-10922
cdb2b726973093613a64aa9297de55b10678e7bd 127.0.0.1:7004@17004 slave 68994639150b6e85ad801b02bf19fc5b7b6975d5 0 1721895157000 3 connected

= general info =

CLUSTER HELP
CLUSTER INFO
CLUSTER MYID
CLUSTER NODES
CLUSTER SLOTS
CLUSTER REPLICAS <node-id>

= daily operations = remove healthy node from cluster (cluster will reheal itself)

CLUSTER FORGET (node-id)
127.0.0.1:7005> CLUSTER FORGET 1c11114d18a76ce7382ed31833121d9f68b6c849
5162:S 25 Jul 2024 11:23:57.223 # Cluster state changed: fail
5162:S 25 Jul 2024 11:24:58.126 # Cluster state changed: ok
OK

127.0.0.1:7005> cluster nodes
cdb2b726973093613a64aa9297de55b10678e7bd 127.0.0.1:7004@17004 slave 68994639150b6e85ad801b02bf19fc5b7b6975d5 0 1721895905262 3 connected
0e8be45b617d91dd70c8f8b1ce7f1d88bfd70ca5 127.0.0.1:7003@17003 slave 1c11114d18a76ce7382ed31833121d9f68b6c849 0 1721895905777 2 connected
7daacb5d3a77b49cad691257884450ff659fa269 127.0.0.1:7000@17000 master - 0 1721895904000 1 connected 0-5460
1c11114d18a76ce7382ed31833121d9f68b6c849 127.0.0.1:7001@17001 master - 0 1721895904000 2 connected 5461-10922
68994639150b6e85ad801b02bf19fc5b7b6975d5 127.0.0.1:7002@17002 master - 0 1721895904234 3 connected 10923-16383
3efbc8c6870ac9eb6ec0050e1c87e650419f82b1 127.0.0.1:7005@17005 myself,slave 7daacb5d3a77b49cad691257884450ff659fa269 0 1721895902000 1 connected

remove unhealthy node from cluster,

127.0.0.1:7005> exit
ps aux | grep redis

bash root 5123 0.4 2.0 67644 20848 pts/0 Sl 11:04 0:05 redis-server *:7000 [cluster] root 5131 0.3 2.1 67644 21116 pts/0 Sl 11:05 0:05 redis-server *:7001 [cluster] root 5137 0.3 2.1 67644 21176 pts/0 Sl 11:05 0:05 redis-server *:7002 [cluster] root 5138 0.4 2.9 88132 29380 pts/0 Sl 11:05 0:05 redis-server *:7003 [cluster] root 5139 0.4 2.7 88132 27284 pts/0 Sl 11:05 0:05 redis-server *:7004 [cluster] root 5162 0.4 2.9 88132 29424 pts/0 Sl 11:06 0:05 redis-server *:7005 [cluster] root 5217 0.0 0.0 5876 632 pts/0 S+ 11:27 0:00 grep redis

bash kill 5123 5123:signal-handler (1721896064) Received SIGTERM scheduling shutdown... 5123:M 25 Jul 2024 11:27:44.292 # User requested shutdown... 5123:M 25 Jul 2024 11:27:44.292 * Calling fsync() on the AOF file. 5123:M 25 Jul 2024 11:27:44.294 # Redis is now ready to exit, bye bye... 5162:S 25 Jul 2024 11:27:44.304 # Connection with master lost. 5162:S 25 Jul 2024 11:27:44.304 * Caching the disconnected master state. 5162:S 25 Jul 2024 11:27:44.601 * Connecting to MASTER 127.0.0.1:7000 5162:S 25 Jul 2024 11:27:44.601 * MASTER <-> REPLICA sync started 5162:S 25 Jul 2024 11:27:44.601 # Error condition on socket for SYNC: Connection refused 5162:S 25 Jul 2024 11:27:45.639 * Connecting to MASTER 127.0.0.1:7000 5162:S 25 Jul 2024 11:27:45.639 * MASTER <-> REPLICA sync started 5162:S 25 Jul 2024 11:27:45.639 # Error condition on socket for SYNC: Connection refused 5162:S 25 Jul 2024 11:27:46.673 * Connecting to MASTER 127.0.0.1:7000 5162:S 25 Jul 2024 11:27:46.673 * MASTER <-> REPLICA sync started 5162:S 25 Jul 2024 11:27:46.674 # Error condition on socket for SYNC: Connection refused 5162:S 25 Jul 2024 11:27:47.710 * Connecting to MASTER 127.0.0.1:7000 5162:S 25 Jul 2024 11:27:47.711 * MASTER <-> REPLICA sync started 5162:S 25 Jul 2024 11:27:47.712 # Error condition on socket for SYNC: Connection refused 5162:S 25 Jul 2024 11:27:48.756 * Connecting to MASTER 127.0.0.1:7000 5162:S 25 Jul 2024 11:27:48.756 * MASTER <-> REPLICA sync started 5162:S 25 Jul 2024 11:27:48.756 # Error condition on socket for SYNC: Connection refused 5162:S 25 Jul 2024 11:27:49.807 * Connecting to MASTER 127.0.0.1:7000 5162:S 25 Jul 2024 11:27:49.807 * MASTER <-> REPLICA sync started 5162:S 25 Jul 2024 11:27:49.807 # Error condition on socket for SYNC: Connection refused 5162:S 25 Jul 2024 11:27:50.852 * Connecting to MASTER 127.0.0.1:7000 5162:S 25 Jul 2024 11:27:50.852 * MASTER <-> REPLICA sync started 5162:S 25 Jul 2024 11:27:50.858 # Error condition on socket for SYNC: Connection refused 5139:S 25 Jul 2024 11:27:51.062 * Marking node 7daacb5d3a as failing (quorum reached). 5139:S 25 Jul 2024 11:27:51.062 # Cluster state changed: fail 5138:S 25 Jul 2024 11:27:51.063 * FAIL message received from cdb2b72697 about 7daacb5d3a 5138:S 25 Jul 2024 11:27:51.063 # Cluster state changed: fail 5162:S 25 Jul 2024 11:27:51.064 * FAIL message received from cdb2b72697 about 7daacb5d3a 5162:S 25 Jul 2024 11:27:51.064 # Cluster state changed: fail 5137:M 25 Jul 2024 11:27:51.065 * FAIL message received from cdb2b72697 about 7daacb5d3a 5131:M 25 Jul 2024 11:27:51.066 * FAIL message received from cdb2b72697 about 7daacb5d3a 5131:M 25 Jul 2024 11:27:51.067 # Cluster state changed: fail 5137:M 25 Jul 2024 11:27:51.067 # Cluster state changed: fail 5162:S 25 Jul 2024 11:27:51.168 # Start of election delayed for 896 milliseconds (rank #0, offset 1358). 5162:S 25 Jul 2024 11:27:51.899 * Connecting to MASTER 127.0.0.1:7000 5162:S 25 Jul 2024 11:27:51.900 * MASTER <-> REPLICA sync started 5162:S 25 Jul 2024 11:27:51.900 # Error condition on socket for SYNC: Connection refused 5162:S 25 Jul 2024 11:27:52.108 # Starting a failover election for epoch 7. 5131:M 25 Jul 2024 11:27:52.117 # Failover auth granted to 3efbc8c687 for epoch 7 5137:M 25 Jul 2024 11:27:52.118 # Failover auth granted to 3efbc8c687 for epoch 7 5162:S 25 Jul 2024 11:27:52.120 # Failover election won: I'm the new master. 5162:S 25 Jul 2024 11:27:52.120 # configEpoch set to 7 after successful failover 5162:M 25 Jul 2024 11:27:52.120 * Discarding previously cached master state. 5162:M 25 Jul 2024 11:27:52.120 # Setting secondary replication ID to 26cb6145c8, valid up to offset: 1359. New replication ID is 74b74b9ba7 5162:M 25 Jul 2024 11:27:52.120 # Cluster state changed: ok 5138:S 25 Jul 2024 11:27:52.161 # Cluster state changed: ok 5131:M 25 Jul 2024 11:27:52.161 # Cluster state changed: ok 5139:S 25 Jul 2024 11:27:52.162 # Cluster state changed: ok 5137:M 25 Jul 2024 11:27:52.163 # Cluster state changed: ok

connect to working node and check (it is still working, good)

bash redis-cli -h 127.0.0.1 -p 7005 127.0.0.1:7005> cluster info cluster_state:ok [...] 127.0.0.1:7005> cluster nodes cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721896133083 3 connected 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721896134123 2 connected

killed>> 7daacb5d3a 127.0.0.1:7000@17000 master,fail - 1721896064497 1721896061912 1 disconnected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721896133000 2 connected 5461-10922 6899463915 127.0.0.1:7002@17002 master - 0 1721896134528 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 myself,master - 0 1721896133000 7 connected 0-5460

now let's remove failed one, good

bash 127.0.0.1:7005> CLUSTER FORGET 7daacb5d3a 127.0.0.1:7005> cluster nodes cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721896264005 3 connected 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721896265047 2 connected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721896264000 2 connected 5461-10922 6899463915 127.0.0.1:7002@17002 master - 0 1721896264527 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 myself,master - 0 1721896263000 7 connected 0-5460 exit

let start it again and add back to the cluster

at this point, we forget which one was killed, let's check it

bash ps aux | grep redis root 5131 0.4 2.1 67644 21116 pts/0 Sl 11:05 0:06 redis-server *:7001 [cluster] root 5137 0.4 2.1 67644 21184 pts/0 Sl 11:05 0:06 redis-server *:7002 [cluster] root 5138 0.4 2.9 88132 29316 pts/0 Sl 11:05 0:07 redis-server *:7003 [cluster] root 5139 0.4 2.7 88132 27220 pts/0 Sl 11:05 0:07 redis-server *:7004 [cluster] root 5162 0.4 2.9 88132 29420 pts/0 Sl 11:06 0:07 redis-server *:7005 [cluster] root 5234 0.0 0.0 5876 660 pts/0 S+ 11:32 0:00 grep redis

and start one on port 7000

bash cluster-test# cd 7000 cluster-test/7000# redis-server redis.conf & cd ..

connect to working cluster (not the node just started)

bash redis-cli -h 127.0.0.1 -p 7005 127.0.0.1:7005> cluster nodes cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721896487852 3 connected 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721896487000 2 connected 7daacb5d3a 127.0.0.1:7000@17000 slave 3efbc8c687 0 1721896487541 7 connected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721896487540 2 connected 5461-10922 6899463915 127.0.0.1:7002@17002 master - 0 1721896487000 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 myself,master - 0 1721896487000 7 connected 0-5460

we can see, that new node automatically joined the cluster (why? because node had nodes.conf file left and it knew a previous status of the cluster)
remove files, run node and introduce new node

bash cluster-test/7000# rm appendonly.aof cluster-test/7000# rm dump.rdb cluster-test/7000# rm nodes.conf cluster-test/7000# redis-server redis.conf &

redis-cli -h 127.0.0.1 -p 7005 27.0.0.1:7005> cluster nodes cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721896868502 3 connected 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721896867000 2 connected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721896867475 2 connected 5461-10922 6899463915 127.0.0.1:7002@17002 master - 0 1721896868000 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 myself,master - 0 1721896866000 7 connected 0-5460 127.0.0.1:7005> 127.0.0.1:7005> 127.0.0.1:7005> 127.0.0.1:7005> cluster meet 127.0.0.1 7000 OK 127.0.0.1:7005> cluster nodes cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721896921113 3 connected 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721896920188 2 connected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721896921218 2 connected 5461-10922

new>> e8fce8d1e5 127.0.0.1:7000@17000 master - 0 1721896921114 0 connected 6899463915 127.0.0.1:7002@17002 master - 0 1721896920000 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 myself,master - 0 1721896920000 7 connected 0-5460

127.0.0.1:7005> cluster nodes cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721896940878 3 connected 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721896940000 2 connected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721896940000 2 connected 5461-10922

new>> e8fce8d1e5 127.0.0.1:7000@17000 master - 0 1721896941395 0 connected 6899463915 127.0.0.1:7002@17002 master - 0 1721896939845 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 myself,master - 0 1721896940000 7 connected 0-5460

127.0.0.1:7005> cluster nodes cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721896975934 3 connected 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721896974587 2 connected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721896974898 2 connected 5461-10922

new>> e8fce8d1e5 127.0.0.1:7000@17000 master - 0 1721896973864 0 connected 6899463915 127.0.0.1:7002@17002 master - 0 1721896975104 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 myself,master - 0 1721896974000 7 connected 0-5460 127.0.0.1:7005>


now, cluster contains
- three masters for assigned slots (one of them is slave promoted to master)
- two slaves for them (one slave became master).
- there is anew master without slave and promoted slave without master

we need to promote new node as a slave for the master which does not have slave, let's find out

bash 127.0.0.1:7005> cluster slots 1) 1) (integer) 5461 2) (integer) 10922 3) 1) "127.0.0.1"

  2) (integer) 7001
  3) "1c11114d18a76ce7382ed31833121d9f68b6c849"

4) 1) "127.0.0.1"

  2) (integer) 7003
  3) "0e8be45b617d91dd70c8f8b1ce7f1d88bfd70ca5"

2) 1) (integer) 10923 2) (integer) 16383 3) 1) "127.0.0.1"

  2) (integer) 7002
  3) "68994639150b6e85ad801b02bf19fc5b7b6975d5"

4) 1) "127.0.0.1"

  2) (integer) 7004
  3) "cdb2b726973093613a64aa9297de55b10678e7bd"

3) 1) (integer) 0 2) (integer) 5460 3) 1) "127.0.0.1"

  2) (integer) 7005
  3) "3efbc8c6870ac9eb6ec0050e1c87e650419f82b1"
slots 0-5460 have only one node (last one), which is 3efbc8c6870ac9eb6ec0050e1c87e650419f82b1
promote not assigned (new one) as slave for master, need to be connected to desired node

bash 127.0.0.1:7000> cluster nodes 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721897698471 2 connected e8fce8d1e5 127.0.0.1:7000@17000 myself,master - 0 1721897697000 0 connected cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721897698781 3 connected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721897696408 2 connected 5461-10922 6899463915 127.0.0.1:7002@17002 master - 0 1721897698000 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 master - 0 1721897698000 7 connected 0-5460

Configure current node as replica to the master (without a slave in our case)

bash CLUSTER REPLICATE 3efbc8c687 OK 5162:M 25 Jul 2024 11:56:37.000 * Replica 127.0.0.1:7000 asks for synchronization 5162:M 25 Jul 2024 11:56:37.000 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '10169585a4', my replication IDs are '74b74b9ba7' and '26cb6145c8') 5162:M 25 Jul 2024 11:56:37.000 * Starting BGSAVE for SYNC with target: disk 5162:M 25 Jul 2024 11:56:37.001 * Background saving started by pid 5317 5317:C 25 Jul 2024 11:56:37.005 * DB saved on disk 5317:C 25 Jul 2024 11:56:37.006 * RDB: 0 MB of memory used by copy-on-write 5162:M 25 Jul 2024 11:56:37.099 * Background saving terminated with success 5162:M 25 Jul 2024 11:56:37.100 * Synchronization with replica 127.0.0.1:7000 succeeded

checking cluster

bash 127.0.0.1:7000> cluster info cluster_state:ok [...] 127.0.0.1:7000> cluster nodes 0e8be45b61 127.0.0.1:7003@17003 slave 1c11114d18 0 1721897867645 2 connected e8fce8d1e5 127.0.0.1:7000@17000 myself,slave 3efbc8c687 0 1721897866000 7 connected cdb2b72697 127.0.0.1:7004@17004 slave 6899463915 0 1721897866616 3 connected 1c11114d18 127.0.0.1:7001@17001 master - 0 1721897867000 2 connected 5461-10922 6899463915 127.0.0.1:7002@17002 master - 0 1721897868157 3 connected 10923-16383 3efbc8c687 127.0.0.1:7005@17005 master - 0 1721897867127 7 connected 0-5460

flight normal.



Monitoring:

url https://redis.io/docs/latest/operate/rs/clusters/monitoring/ https://medium.com/@MetricFire/how-to-monitor-redis-performance-819125702401 https://docs.digitalocean.com/products/databases/redis/how-to/monitor-clusters/ https://signoz.io/blog/redis-monitoring/ ```