Overview
A write in Cassandra is immediately written to commit logs
and memtable, and once this memtable is full it is flushed on to disk in the
form of SSTables. For each memtable a
new SSTable is created. The process of merging these SSTables to improve the
performance of partition scans and to reclaim space from deleted data
(tombstone) is called compaction in Cassandra.
Full
compaction: A full compaction applies only to SizeTieredCompactionStrategy.
It merges all SSTables into one large SSTable.
What
happens during Compaction?
Compaction is a periodic background process. During
compaction Cassandra merges SSTables by combining row fragments, evicting
expired tombstones, and
rebuilding indexes. As SSTables are sorted, the merge is efficient (no random
disk I/O). After a newly merged SSTable is complete, the input SSTables are
reference counted and removed as soon as possible. During compaction, there is
a temporary spike in disk space usage and disk I/O.
Impact of
compaction
Compaction impacts both read and write performance. During
compaction higher disk I/O and utilization causes a little performance
degradation.
Impact on
read: Compaction can impact reads that
are not fulfilled by the cache due to high disk I/O, however, after a
compaction completes, off-cache read performance improves since there are fewer
SSTable files on disk that need to be checked
Impact on
disk: A full compaction is I/O and CPU
intensive and can temporarily require double disk space when no old versions or
tombstones are evicted. It is thus advisable to have disk double the size of
your data.
Compaction also results in disk space getting reclaimed when
major data in sstables is present in form of tombstones
Types of
Compaction Strategies
Cassandra provides two compaction strategies:
SizeTieredCompactionStrategy: This is the default compaction
strategy. This strategy gathers SSTable of similar size (default 4) and
compacts them together into a larger SSTable.
Figure 2 shows what we expect much later, when the
second-tier sstables have been combined to create third-tier, third tier
combined to create fourth, and so on.
LeveledCompactionStrategy: Introduced in Cassandra 1.0, this
strategy creates SSTables of a fixed, relatively small size (5 MB by default)
that are grouped into levels. Within each level, SSTables are guaranteed to be
non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the
previous.
In figure 3, new sstables are added to the first level, L0,
and immediately compacted with the sstables in L1 (blue). When L1 fills up,
extra sstables are promoted to L2 (violet). Subsequent sstables generated in L1
will be compacted with the sstables in L2 with which they overlap. As more data
is added, leveled compaction results in a situation like the one shown in
figure 4.
What problems
does Leveled compaction solves?
There are three
problems with size-tiered compaction in update-heavy workloads:
- Read
performance can be inconsistent because there are no guarantees as to how
many sstables a row may be spread across: in the worst case, we could have
columns from a given row in each sstable.
- A
substantial amount of space can be wasted since there is no guarantee as
to how quickly obsolete columns (tombstone) will be merged out of existence;
this is particularly noticeable when there is a high ratio of deletes.
- Space
can also be a problem as sstables grow larger from repeated compactions,
since an obsolete sstable cannot be removed until the merged sstable is
completely written. In the worst case of a single set of large
sstable with no obsolete rows to remove, Cassandra would need 100% as much
free space as is used by the sstables being compacted, into which to write
the merged one.
Leveled
compaction solves the above problems with tiered compaction as follows:
- Leveled
compaction guarantees that 90% of all reads will be satisfied from a single
sstable (assuming nearly-uniform row size). Worst case is bounded at the
total number of levels — e.g., 7 for 10TB of data.
- At
most 10% of space will be wasted by obsolete rows.
- Only
enough space for 10x the sstable size needs to be reserved for temporary
use by compaction.
When to use
which compaction
Both
compaction strategies are good depending on the use case. It is thus important
to understand when to use which compaction strategy.
When to use Leveled Compaction
Following
are the scenarios in which use of leveled compaction is recommended
1. High Sensitivity to Read Latency:
In addition to lowering read latencies in general, leveled compaction lowers
the amount of variability in read latencies. If your application has strict
latency requirements for the 99th percentile, leveled compaction may be the
only reliable way to meet those requirements, because it gives you a known
upper bound on the number of SSTables that must be read.
2. High Read/Write Ratio:
If you perform at least twice as many reads as you do writes, leveled
compaction may actually save you disk I/O, despite consuming more I/O for
compaction. This is especially true if your reads are fairly random and don’t
focus on a single, hot dataset.
3. Rows Are Frequently Updated: If
you have wide rows where new columns are constantly added then it will be
spread across multiple SSTables with size-tired compaction. Leveled compaction,
on the other hand, keeps the number of SSTables that the row is spread across
very low, even with frequent row updates. So, if you’re using wide rows then leveled
compaction can be useful.
4. Deletions or TTLed Columns in Wide Rows:
If you maintain event timelines in wide rows and set TTLs on the columns in
order to limit the timeline to a window of recent time, those columns will be
replaced by tombstones when they expire. Similarly, if you use a wide row as a
work queue, deleting columns after they have been processed, you will wind up
with a lot of tombstones in each row. So, even if there aren’t many live
columns still in a row, there may be tombstones for that row spread across
extra SSTables; these too need to be read by Cassandra, potentially requiring
additional disk seeks. In this scenario Leveled Compaction is a better choice
as it limits the row spreading and frequently removes the tombstones while
merging.
When to use Size Tiered Compaction
Following
are the scenarios in which size tiered compaction is recommended:
1. Write-heavy Workloads:
If you have more writes than reads then it is recommended to use size tiered as
it may be difficult for leveled compaction to keep up with write-heavy
workloads, and because reads are infrequent, there is little benefit to the
extra compaction I/O.
2. Rows Are Write-Once:
If your rows are always written entirely at once and are never updated, they
will naturally always be contained by a single SSTable when using size-tiered
compaction. Thus, there’s really nothing to gain from leveled compaction.
3.
Low
I/O performance: If your cluster
is already pressed for I/O, switching to leveled compaction will almost
certainly only worsen the problem, so it’s better to use size tiered when disk
IOPS is not good.
Verification
Test
Following
test can be performed to verify the impact of different compaction strategies
on read performance and disk usage
1. Create a column family in Cassandra
for storing user’s status updates. Keep user-id as row-key and status as
columns value, current timestamp as column name.
2. Set a TTL on status column to remove
a status older than 5 minutes.
3. Set default compaction strategy of
size-tiered compaction.
4. Keep inserting lots of message for
the same user for about an hour
5. Meanwhile, from another program read
the status updates for the same user.
6. Watch the read latency and disk usage
parameters over the time.
7. Perform the same test with leveled
compaction strategy and note the difference