Cassandra Replicas Down while repairing a nodelet?

I am developing an automated script to restore nodetool that will run every weekend on all Cassandra nodes. We have 3 in DC1 and 3 in DC2. I just want to understand the worst case scenario. What happens if the connection between DC1 and DC2 is lost or the pair of replicas drops before or during nodetool recovery. This could be a network problem, a network update (which usually happens on weekends), or something else. I understand that nodetool repair computes the Merkle tree for each data range on this node and compares it with versions on other replicas. So, if they have no connection between the replicas, how will the nodetool repair behave? Will he really repair the nodes. Do I have to restart the node recovery tool after recovering all nodes and reconnecting.Will they be side effects of this event? I woke up, but could not find many details. Any insight would be helpful.

Thank.

+5
source share
2 answers

Let's say you use vnodes, which by default means that each node has 256 ranges, but the idea is the same.

If a network problem occurs after the nodetool repair is already running, in the logs you will see that some ranges where they were successfully restored and others not. The error will say that the range correction failed because the node "192.168.1.1 is dead" is something like this.

If a network error occurs before starting nodetool recovery, all ranges will fail with the same error.

In both cases, you will need to start another nodetool repair after solving the network problem.

, 6 , , , , nodetool . , node 1 , node 2 . / , . , nodetool, , , , .

, , 1, , "" node, "" . , , 2, , 2 " ", , . , "" - = , , , .

, !

+1

, . DSE Cassandra, OpsCenter, , , gc_grace_seconds.

:

  • (): 3 : 1 2 , node, . 5 . 2 1 , 2 2 , 1 node 3 .
  • -par: .
  • -pr: node, . EACH_QUORUM, -local, .

3, , - - .

,

+1

All Articles