Redis Cluster install - local educative setup to remove and add nodes from and to the Redis Cluster.
installation
apt install redis
access CLI
redis-cli
cluster info
cluster nodes
ERR This instance has cluster support disabled
enable cluster mode by editing conf file search for cluster-enables
vi /etc/redis/redis.conf
uncomment
cluster-enabled yes
cluster-config-file nodes-6379.conf
cluster-node-timeout 15000
appendonly yes
restart in cluster mode
systemctl restart redis
systemctl status redis
should be
● redis-server.service - Advanced key-value store
[...]
└─5017 /usr/bin/redis-server 127.0.0.1:6379 [cluster]
repeat cluster info
127.0.0.1:6379> cluster nodes
7021db16f7ce7a865912dabcbfd4573e4c501a3b :6379@16379 myself,master - 0 0 0 connected
stop the service to proceed with temporary testing
systemctl stop redis
cd
mkdir cluster-test
cd cluster-test
mkdir 7000 7001 7002 7003 7004 7005
create for each node specific config, changing directory name and port number
vi 7000/redis.conf
port 7000
cluster-enabled yes
cluster-config-file 7000/nodes.conf
cluster-node-timeout 5000
appendonly yes
once all config in place, start the nodes
cd 7000
redis-server redis.conf &
cd ../7001
redis-server redis.conf &
cd ../7002
redis-server redis.conf &
cd ../7003
redis-server redis.conf &
cd ../7004
redis-server redis.conf &
cd ../7005
redis-server redis.conf &
cd ..
check everything is running
ps aux | grep redis
ss -ntap | grep redis
build a cluster
redis-cli --cluster create \
127.0.0.1:7000 \
127.0.0.1:7001 \
127.0.0.1:7002 \
127.0.0.1:7003 \
127.0.0.1:7004 \
127.0.0.1:7005 \
--cluster-replicas 1
cluster will ask about some replicas on the same hosts as masters
Can I set the above configuration? (type 'yes' to accept): yes
connect to the first node and check cluster info
redis-cli -h 127.0.0.1 -p 7000
info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:137
cluster_stats_messages_pong_sent:136
cluster_stats_messages_sent:273
cluster_stats_messages_ping_received:131
cluster_stats_messages_pong_received:137
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:273
and cluster's node info
127.0.0.1:7000> CLUSTER NODES
7daacb5d3a77b49cad691257884450ff659fa269 127.0.0.1:7000@17000 myself,master - 0 1721895157000 1 connected 0-5460
0e8be45b617d91dd70c8f8b1ce7f1d88bfd70ca5 127.0.0.1:7003@17003 slave 1c11114d18a76ce7382ed31833121d9f68b6c849 0 1721895157502 2 connected
3efbc8c6870ac9eb6ec0050e1c87e650419f82b1 127.0.0.1:7005@17005 slave 7daacb5d3a77b49cad691257884450ff659fa269 0 1721895157916 1 connected
68994639150b6e85ad801b02bf19fc5b7b6975d5 127.0.0.1:7002@17002 master - 0 1721895156000 3 connected 10923-16383
1c11114d18a76ce7382ed31833121d9f68b6c849 127.0.0.1:7001@17001 master - 0 1721895157000 2 connected 5461-10922
cdb2b726973093613a64aa9297de55b10678e7bd 127.0.0.1:7004@17004 slave 68994639150b6e85ad801b02bf19fc5b7b6975d5 0 1721895157000 3 connected
= general info =
CLUSTER HELP
CLUSTER INFO
CLUSTER MYID
CLUSTER NODES
CLUSTER SLOTS
CLUSTER REPLICAS <node-id>
= daily operations = remove healthy node from cluster (cluster will reheal itself)
CLUSTER FORGET (node-id)
127.0.0.1:7005> CLUSTER FORGET 1c11114d18a76ce7382ed31833121d9f68b6c849
5162:S 25 Jul 2024 11:23:57.223 # Cluster state changed: fail
5162:S 25 Jul 2024 11:24:58.126 # Cluster state changed: ok
OK
127.0.0.1:7005> cluster nodes
cdb2b726973093613a64aa9297de55b10678e7bd 127.0.0.1:7004@17004 slave 68994639150b6e85ad801b02bf19fc5b7b6975d5 0 1721895905262 3 connected
0e8be45b617d91dd70c8f8b1ce7f1d88bfd70ca5 127.0.0.1:7003@17003 slave 1c11114d18a76ce7382ed31833121d9f68b6c849 0 1721895905777 2 connected
7daacb5d3a77b49cad691257884450ff659fa269 127.0.0.1:7000@17000 master - 0 1721895904000 1 connected 0-5460
1c11114d18a76ce7382ed31833121d9f68b6c849 127.0.0.1:7001@17001 master - 0 1721895904000 2 connected 5461-10922
68994639150b6e85ad801b02bf19fc5b7b6975d5 127.0.0.1:7002@17002 master - 0 1721895904234 3 connected 10923-16383
3efbc8c6870ac9eb6ec0050e1c87e650419f82b1 127.0.0.1:7005@17005 myself,slave 7daacb5d3a77b49cad691257884450ff659fa269 0 1721895902000 1 connected
remove unhealthy node from cluster,
127.0.0.1:7005> exit
ps aux | grep redis
bash root 5123 0.4 2.0 67644 20848 pts/0 Sl 11:04 0:05 redis-server *:7000 [cluster] root 5131 0.3 2.1 67644 21116 pts/0 Sl 11:05 0:05 redis-server *:7001 [cluster] root 5137 0.3 2.1 67644 21176 pts/0 Sl 11:05 0:05 redis-server *:7002 [cluster] root 5138 0.4 2.9 88132 29380 pts/0 Sl 11:05 0:05 redis-server *:7003 [cluster] root 5139 0.4 2.7 88132 27284 pts/0 Sl 11:05 0:05 redis-server *:7004 [cluster] root 5162 0.4 2.9 88132 29424 pts/0 Sl 11:06 0:05 redis-server *:7005 [cluster] root 5217 0.0 0.0 5876 632 pts/0 S+ 11:27 0:00 grep redis
bash
kill 5123
5123:signal-handler (1721896064) Received SIGTERM scheduling shutdown...
5123:M 25 Jul 2024 11:27:44.292 # User requested shutdown...
5123:M 25 Jul 2024 11:27:44.292 * Calling fsync() on the AOF file.
5123:M 25 Jul 2024 11:27:44.294 # Redis is now ready to exit, bye bye...
5162:S 25 Jul 2024 11:27:44.304 # Connection with master lost.
5162:S 25 Jul 2024 11:27:44.304 * Caching the disconnected master state.
5162:S 25 Jul 2024 11:27:44.601 * Connecting to MASTER 127.0.0.1:7000
5162:S 25 Jul 2024 11:27:44.601 * MASTER <-> REPLICA sync started
5162:S 25 Jul 2024 11:27:44.601 # Error condition on socket for SYNC: Connection refused
5162:S 25 Jul 2024 11:27:45.639 * Connecting to MASTER 127.0.0.1:7000
5162:S 25 Jul 2024 11:27:45.639 * MASTER <-> REPLICA sync started
5162:S 25 Jul 2024 11:27:45.639 # Error condition on socket for SYNC: Connection refused
5162:S 25 Jul 2024 11:27:46.673 * Connecting to MASTER 127.0.0.1:7000
5162:S 25 Jul 2024 11:27:46.673 * MASTER <-> REPLICA sync started
5162:S 25 Jul 2024 11:27:46.674 # Error condition on socket for SYNC: Connection refused
5162:S 25 Jul 2024 11:27:47.710 * Connecting to MASTER 127.0.0.1:7000
5162:S 25 Jul 2024 11:27:47.711 * MASTER <-> REPLICA sync started
5162:S 25 Jul 2024 11:27:47.712 # Error condition on socket for SYNC: Connection refused
5162:S 25 Jul 2024 11:27:48.756 * Connecting to MASTER 127.0.0.1:7000
5162:S 25 Jul 2024 11:27:48.756 * MASTER <-> REPLICA sync started
5162:S 25 Jul 2024 11:27:48.756 # Error condition on socket for SYNC: Connection refused
5162:S 25 Jul 2024 11:27:49.807 * Connecting to MASTER 127.0.0.1:7000
5162:S 25 Jul 2024 11:27:49.807 * MASTER <-> REPLICA sync started
5162:S 25 Jul 2024 11:27:49.807 # Error condition on socket for SYNC: Connection refused
5162:S 25 Jul 2024 11:27:50.852 * Connecting to MASTER 127.0.0.1:7000
5162:S 25 Jul 2024 11:27:50.852 * MASTER <-> REPLICA sync started
5162:S 25 Jul 2024 11:27:50.858 # Error condition on socket for SYNC: Connection refused
5139:S 25 Jul 2024 11:27:51.062 * Marking node 7daacb5d3a
as failing (quorum reached).
5139:S 25 Jul 2024 11:27:51.062 # Cluster state changed: fail
5138:S 25 Jul 2024 11:27:51.063 * FAIL message received from cdb2b72697
about 7daacb5d3a
5138:S 25 Jul 2024 11:27:51.063 # Cluster state changed: fail
5162:S 25 Jul 2024 11:27:51.064 * FAIL message received from cdb2b72697
about 7daacb5d3a
5162:S 25 Jul 2024 11:27:51.064 # Cluster state changed: fail
5137:M 25 Jul 2024 11:27:51.065 * FAIL message received from cdb2b72697
about 7daacb5d3a
5131:M 25 Jul 2024 11:27:51.066 * FAIL message received from cdb2b72697
about 7daacb5d3a
5131:M 25 Jul 2024 11:27:51.067 # Cluster state changed: fail
5137:M 25 Jul 2024 11:27:51.067 # Cluster state changed: fail
5162:S 25 Jul 2024 11:27:51.168 # Start of election delayed for 896 milliseconds (rank #0, offset 1358).
5162:S 25 Jul 2024 11:27:51.899 * Connecting to MASTER 127.0.0.1:7000
5162:S 25 Jul 2024 11:27:51.900 * MASTER <-> REPLICA sync started
5162:S 25 Jul 2024 11:27:51.900 # Error condition on socket for SYNC: Connection refused
5162:S 25 Jul 2024 11:27:52.108 # Starting a failover election for epoch 7.
5131:M 25 Jul 2024 11:27:52.117 # Failover auth granted to 3efbc8c687
for epoch 7
5137:M 25 Jul 2024 11:27:52.118 # Failover auth granted to 3efbc8c687
for epoch 7
5162:S 25 Jul 2024 11:27:52.120 # Failover election won: I'm the new master.
5162:S 25 Jul 2024 11:27:52.120 # configEpoch set to 7 after successful failover
5162:M 25 Jul 2024 11:27:52.120 * Discarding previously cached master state.
5162:M 25 Jul 2024 11:27:52.120 # Setting secondary replication ID to 26cb6145c8
, valid up to offset: 1359. New replication ID is 74b74b9ba7
5162:M 25 Jul 2024 11:27:52.120 # Cluster state changed: ok
5138:S 25 Jul 2024 11:27:52.161 # Cluster state changed: ok
5131:M 25 Jul 2024 11:27:52.161 # Cluster state changed: ok
5139:S 25 Jul 2024 11:27:52.162 # Cluster state changed: ok
5137:M 25 Jul 2024 11:27:52.163 # Cluster state changed: ok
connect to working node and check (it is still working, good)
bash
redis-cli -h 127.0.0.1 -p 7005
127.0.0.1:7005> cluster info
cluster_state:ok
[...]
127.0.0.1:7005> cluster nodes
cdb2b72697
127.0.0.1:7004@17004 slave 6899463915
0 1721896133083 3 connected
0e8be45b61
127.0.0.1:7003@17003 slave 1c11114d18
0 1721896134123 2 connected
killed>>
7daacb5d3a
127.0.0.1:7000@17000 master,fail - 1721896064497 1721896061912 1 disconnected1c11114d18
127.0.0.1:7001@17001 master - 0 1721896133000 2 connected 5461-109226899463915
127.0.0.1:7002@17002 master - 0 1721896134528 3 connected 10923-163833efbc8c687
127.0.0.1:7005@17005 myself,master - 0 1721896133000 7 connected 0-5460now let's remove failed one, good
bash 127.0.0.1:7005> CLUSTER FORGET
7daacb5d3a
127.0.0.1:7005> cluster nodescdb2b72697
127.0.0.1:7004@17004 slave6899463915
0 1721896264005 3 connected0e8be45b61
127.0.0.1:7003@17003 slave1c11114d18
0 1721896265047 2 connected1c11114d18
127.0.0.1:7001@17001 master - 0 1721896264000 2 connected 5461-109226899463915
127.0.0.1:7002@17002 master - 0 1721896264527 3 connected 10923-163833efbc8c687
127.0.0.1:7005@17005 myself,master - 0 1721896263000 7 connected 0-5460 exitlet start it again and add back to the cluster at this point, we forget which one was killed, let's check it
bash ps aux | grep redis root 5131 0.4 2.1 67644 21116 pts/0 Sl 11:05 0:06 redis-server *:7001 [cluster] root 5137 0.4 2.1 67644 21184 pts/0 Sl 11:05 0:06 redis-server *:7002 [cluster] root 5138 0.4 2.9 88132 29316 pts/0 Sl 11:05 0:07 redis-server *:7003 [cluster] root 5139 0.4 2.7 88132 27220 pts/0 Sl 11:05 0:07 redis-server *:7004 [cluster] root 5162 0.4 2.9 88132 29420 pts/0 Sl 11:06 0:07 redis-server *:7005 [cluster] root 5234 0.0 0.0 5876 660 pts/0 S+ 11:32 0:00 grep redis
and start one on port 7000
bash cluster-test# cd 7000 cluster-test/7000# redis-server redis.conf & cd ..
connect to working cluster (not the node just started)
bash redis-cli -h 127.0.0.1 -p 7005 127.0.0.1:7005> cluster nodes
cdb2b72697
127.0.0.1:7004@17004 slave6899463915
0 1721896487852 3 connected0e8be45b61
127.0.0.1:7003@17003 slave1c11114d18
0 1721896487000 2 connected7daacb5d3a
127.0.0.1:7000@17000 slave3efbc8c687
0 1721896487541 7 connected1c11114d18
127.0.0.1:7001@17001 master - 0 1721896487540 2 connected 5461-109226899463915
127.0.0.1:7002@17002 master - 0 1721896487000 3 connected 10923-163833efbc8c687
127.0.0.1:7005@17005 myself,master - 0 1721896487000 7 connected 0-5460we can see, that new node automatically joined the cluster (why? because node had nodes.conf file left and it knew a previous status of the cluster) remove files, run node and introduce new node
bash cluster-test/7000# rm appendonly.aof cluster-test/7000# rm dump.rdb cluster-test/7000# rm nodes.conf cluster-test/7000# redis-server redis.conf &
redis-cli -h 127.0.0.1 -p 7005
27.0.0.1:7005> cluster nodes
cdb2b72697
127.0.0.1:7004@17004 slave 6899463915
0 1721896868502 3 connected
0e8be45b61
127.0.0.1:7003@17003 slave 1c11114d18
0 1721896867000 2 connected
1c11114d18
127.0.0.1:7001@17001 master - 0 1721896867475 2 connected 5461-10922
6899463915
127.0.0.1:7002@17002 master - 0 1721896868000 3 connected 10923-16383
3efbc8c687
127.0.0.1:7005@17005 myself,master - 0 1721896866000 7 connected 0-5460
127.0.0.1:7005>
127.0.0.1:7005>
127.0.0.1:7005>
127.0.0.1:7005> cluster meet 127.0.0.1 7000
OK
127.0.0.1:7005> cluster nodes
cdb2b72697
127.0.0.1:7004@17004 slave 6899463915
0 1721896921113 3 connected
0e8be45b61
127.0.0.1:7003@17003 slave 1c11114d18
0 1721896920188 2 connected
1c11114d18
127.0.0.1:7001@17001 master - 0 1721896921218 2 connected 5461-10922
new>>
e8fce8d1e5
127.0.0.1:7000@17000 master - 0 1721896921114 0 connected6899463915
127.0.0.1:7002@17002 master - 0 1721896920000 3 connected 10923-163833efbc8c687
127.0.0.1:7005@17005 myself,master - 0 1721896920000 7 connected 0-5460
127.0.0.1:7005> cluster nodes
cdb2b72697
127.0.0.1:7004@17004 slave 6899463915
0 1721896940878 3 connected
0e8be45b61
127.0.0.1:7003@17003 slave 1c11114d18
0 1721896940000 2 connected
1c11114d18
127.0.0.1:7001@17001 master - 0 1721896940000 2 connected 5461-10922
new>>
e8fce8d1e5
127.0.0.1:7000@17000 master - 0 1721896941395 0 connected6899463915
127.0.0.1:7002@17002 master - 0 1721896939845 3 connected 10923-163833efbc8c687
127.0.0.1:7005@17005 myself,master - 0 1721896940000 7 connected 0-5460
127.0.0.1:7005> cluster nodes
cdb2b72697
127.0.0.1:7004@17004 slave 6899463915
0 1721896975934 3 connected
0e8be45b61
127.0.0.1:7003@17003 slave 1c11114d18
0 1721896974587 2 connected
1c11114d18
127.0.0.1:7001@17001 master - 0 1721896974898 2 connected 5461-10922
new>>
e8fce8d1e5
127.0.0.1:7000@17000 master - 0 1721896973864 0 connected6899463915
127.0.0.1:7002@17002 master - 0 1721896975104 3 connected 10923-163833efbc8c687
127.0.0.1:7005@17005 myself,master - 0 1721896974000 7 connected 0-5460 127.0.0.1:7005>now, cluster contains - three masters for assigned slots (one of them is slave promoted to master) - two slaves for them (one slave became master). - there is anew master without slave and promoted slave without master we need to promote new node as a slave for the master which does not have slave, let's find out
bash 127.0.0.1:7005> cluster slots 1) 1) (integer) 5461 2) (integer) 10922 3) 1) "127.0.0.1"
2) (integer) 7001 3) "1c11114d18a76ce7382ed31833121d9f68b6c849"
4) 1) "127.0.0.1"
2) (integer) 7003 3) "0e8be45b617d91dd70c8f8b1ce7f1d88bfd70ca5"
2) 1) (integer) 10923 2) (integer) 16383 3) 1) "127.0.0.1"
2) (integer) 7002 3) "68994639150b6e85ad801b02bf19fc5b7b6975d5"
4) 1) "127.0.0.1"
2) (integer) 7004 3) "cdb2b726973093613a64aa9297de55b10678e7bd"
3) 1) (integer) 0 2) (integer) 5460 3) 1) "127.0.0.1"
2) (integer) 7005 3) "3efbc8c6870ac9eb6ec0050e1c87e650419f82b1"
slots 0-5460 have only one node (last one), which is 3efbc8c6870ac9eb6ec0050e1c87e650419f82b1 promote not assigned (new one) as slave for master, need to be connected to desired node
bash 127.0.0.1:7000> cluster nodes
0e8be45b61
127.0.0.1:7003@17003 slave1c11114d18
0 1721897698471 2 connectede8fce8d1e5
127.0.0.1:7000@17000 myself,master - 0 1721897697000 0 connectedcdb2b72697
127.0.0.1:7004@17004 slave6899463915
0 1721897698781 3 connected1c11114d18
127.0.0.1:7001@17001 master - 0 1721897696408 2 connected 5461-109226899463915
127.0.0.1:7002@17002 master - 0 1721897698000 3 connected 10923-163833efbc8c687
127.0.0.1:7005@17005 master - 0 1721897698000 7 connected 0-5460Configure current node as replica to the master (without a slave in our case)
bash CLUSTER REPLICATE
3efbc8c687
OK 5162:M 25 Jul 2024 11:56:37.000 * Replica 127.0.0.1:7000 asks for synchronization 5162:M 25 Jul 2024 11:56:37.000 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '10169585a4
', my replication IDs are '74b74b9ba7
' and '26cb6145c8
') 5162:M 25 Jul 2024 11:56:37.000 * Starting BGSAVE for SYNC with target: disk 5162:M 25 Jul 2024 11:56:37.001 * Background saving started by pid 5317 5317:C 25 Jul 2024 11:56:37.005 * DB saved on disk 5317:C 25 Jul 2024 11:56:37.006 * RDB: 0 MB of memory used by copy-on-write 5162:M 25 Jul 2024 11:56:37.099 * Background saving terminated with success 5162:M 25 Jul 2024 11:56:37.100 * Synchronization with replica 127.0.0.1:7000 succeededchecking cluster
bash 127.0.0.1:7000> cluster info cluster_state:ok [...] 127.0.0.1:7000> cluster nodes
0e8be45b61
127.0.0.1:7003@17003 slave1c11114d18
0 1721897867645 2 connectede8fce8d1e5
127.0.0.1:7000@17000 myself,slave3efbc8c687
0 1721897866000 7 connectedcdb2b72697
127.0.0.1:7004@17004 slave6899463915
0 1721897866616 3 connected1c11114d18
127.0.0.1:7001@17001 master - 0 1721897867000 2 connected 5461-109226899463915
127.0.0.1:7002@17002 master - 0 1721897868157 3 connected 10923-163833efbc8c687
127.0.0.1:7005@17005 master - 0 1721897867127 7 connected 0-5460flight normal. Monitoring:
url https://redis.io/docs/latest/operate/rs/clusters/monitoring/ https://medium.com/@MetricFire/how-to-monitor-redis-performance-819125702401 https://docs.digitalocean.com/products/databases/redis/how-to/monitor-clusters/ https://signoz.io/blog/redis-monitoring/ ```