Understanding Nodetool Repair
Nodetool repair is a crucial maintenance operation in Apache Cassandra that helps ensure data consistency across nodes in a cluster. It’s not automatically executed by Cassandra, but rather initiated manually by database operators.
Basic Repair Commands
To run a basic incremental repair, which is the default option, use:
nodetool repair
For a full repair, which is more thorough but resource-intensive, use:
nodetool repair –full
Targeted Repairs
You can target specific keyspaces or tables for repair:
nodetool repair [options]
nodetool repair [options]
Best Practices for Running Repairs
Schedule regular repairs: Run incremental repairs every 1-3 days and full repairs every 1-3 weeks.
Avoid cluster overload: Stagger repairs across nodes to prevent overwhelming the cluster.
Monitor repair progress: Use nodetool status to check repair status and completion.
Consider repair frequency: Repair more often than your gc_grace_seconds setting to prevent deleted data reappearance.
Advanced Repair Options
Primary Range Repair: Use -pr flag to repair only primary ranges on a node.
Datacenter-Specific Repair: Use -dc option to repair within a specific datacenter.
Token Range Repair: Specify start and end tokens with -st and -et options for targeted repairs.
Troubleshooting Common Repair Issues
Repair Timeouts: Increase repair_timeout_in_ms in cassandra.yaml if repairs consistently time out.
High Resource Usage: Use -j option to limit the number of concurrent repair jobs.
Incomplete Repairs: Check system logs for errors and consider running a full repair if incremental repairs fail.