pycassa.batch – Batch Operations

The batch interface allows insert, update, and remove operations to be performed in batches. This allows a convenient mechanism for streaming updates or doing a large number of operations while reducing number of RPC roundtrips.

Batch mutator objects are synchronized and can be safely passed around threads.

>>> b = cf.batch(queue_size=10)
>>> b.insert('key1', {'col1':'value11', 'col2':'value21'})
>>> b.insert('key2', {'col1':'value12', 'col2':'value22'}, ttl=15)
>>> b.remove('key1', ['col2'])
>>> b.remove('key2')
>>> b.send()

One can use the queue_size argument to control how many mutations will be queued before an automatic send() is performed. This allows simple streaming of updates. If set to None, automatic checkpoints are disabled. Default is 100.

Supercolumns are supported:

>>> b = scf.batch()
>>> b.insert('key1', {'supercol1': {'colA':'value1a', 'colB':'value1b'}
...                  {'supercol2': {'colA':'value2a', 'colB':'value2b'}})
>>> b.remove('key1', ['colA'], 'supercol1')
>>> b.send()

You may also create a Mutator directly, allowing operations on multiple column families:

>>> b = Mutator(pool)
>>> b.insert(cf, 'key1', {'col1':'value1', 'col2':'value2'})
>>> b.insert(supercf, 'key1', {'subkey1': {'col1':'value1', 'col2':'value2'}})
>>> b.send()

Note

This interface does not implement atomic operations across column families. All the limitations of the batch_mutate Thrift API call applies. Remember, a mutation in Cassandra is always atomic per key per column family only.

Note

If a single operation in a batch fails, the whole batch fails.

In addition mutators can be used as context managers, where an implicit send() will be called upon exit.

>>> with cf.batch() as b:
...     b.insert('key1', {'col1':'value11', 'col2':'value21'})
...     b.insert('key2', {'col1':'value12', 'col2':'value22'})

Calls to insert() and remove() can also be chained:

>>> cf.batch().remove('foo').remove('bar').send()

To use atomic batches (supported in Cassandra 1.2 and later), pass the atomic option in when creating the batch:

>>> cf.batch(atomic=True)

or when sending it:

>>> b = cf.batch()
>>> b.insert('key1', {'col1':'val2'})
>>> b.insert('key2', {'col1':'val2'})
>>> b.send(atomic=True)
class pycassa.batch.Mutator(pool, queue_size=100, write_consistency_level=None, allow_retries=True, atomic=False)

Batch update convenience mechanism.

Queues insert/update/remove operations and executes them when the queue is full or send is called explicitly.

pool is the ConnectionPool that will be used for operations.

After queue_size operations, send() will be executed automatically. Use 0 to disable automatic sends.

insert(column_family, key, columns[, timestamp][, ttl])

Adds a single row insert to the batch.

column_family is the ColumnFamily that the insert will be executed on.

If this is used on a counter column family, integers may be used for column values, and they will be taken as counter adjustments.

remove(column_family, key[, columns][, super_column][, timestamp])

Adds a single row remove to the batch.

column_family is the ColumnFamily that the remove will be executed on.

send([write_consistency_level])

Sends all operations currently in the batch and clears the batch.

class pycassa.batch.CfMutator(column_family, queue_size=100, write_consistency_level=None, allow_retries=True, atomic=False)

A Mutator that deals only with one column family.

column_family is the ColumnFamily that all operations will be executed on.

insert(key, cols[, timestamp][, ttl])

Adds a single row insert to the batch.

remove(key[, columns][, super_column][, timestamp])

Adds a single row remove to the batch.

Previous topic

pycassa.index – Secondary Index Tools

Next topic

pycassa.types – Data Type Descriptions

This Page