Changelog

Changes in Version 1.11.0

Features

  • Upgrade Thrift interface to 19.36.1, which adds support for the LOCAL_ONE consistency level and the populate_io_cache_on_flush column family attribute.

Bug Fixes

  • Return timestamp from remove() in stub ColumnFamily

Miscellaneous

  • Upgrade bundled ez_setup.py

Changes in Version 1.10.0

This release only adds one feature: support for Cassandra 1.2’s atomic batches.

Features

Changes in Version 1.9.1

This release fixes a few edge cases around connection pooling that can affect long-running applications. It also adds token range support to ColumnFamily.get_range(), which can be useful for parallelizing full-ring scans.

Features

Bug Fixes

  • Prevent possible double connection disposal when recycling connections
  • Handle empty strings for IntegerType values
  • Prevent closed connections from being returned to the pool.
  • Ensure connection count is decremented when pool is disposed

Changes in Version 1.9.0

This release adds a couple of minor new features and improves multithreaded locking behavior in ConnectionPool. There should be no backwards-compatibility concerns.

Features

  • Full support for column_start, column_finish, column_count, and column_reversed parameters in stubs
  • Addition of an include_ttl parameter to ColumnFamily fetching methods which works like the existing include_timestamp parameter.

Bug Fixes

  • Reduce the locked critical section in ConnectionPool, primarily to make sure lock aquisition time is not ignored outside of the pool’s timeout setting.

Changes in Version 1.8.0

This release requires either Python 2.6 or 2.7. Python 2.4 and 2.5 are no longer supported. There are no concrete plans for Python 3 compatibility yet.

Features

Bug Fixes

  • Don’t return closed connections to the pool. This was primarily a problem when operations failed after retrying up to the limit, resulting in a MaximumRetryException or AllServersUnavailable.
  • Set keyspace for connection after logging in instead of before. This fixes authentication against Cassandra 1.2, which requires logging in prior to setting a keyspace.
  • Specify correct UUID variant when creating v1 uuid.UUID objects from datetimes or timestamps
  • Add 900ns to v1 uuid.UUID timestamps when the “max” TimeUUID for a specific datetime or timestamp is requested, such as a column slice end
  • Also look at attributes of parent classes when creating columns from attributes in ColumnFamilyMap

Other

  • Upgrade bundled Thrift-generated python to 19.35.0, generated with Thrift 0.9.0.

Changes in Version 1.7.2

This release fixes a minor bug and upgrades the bundled Cassandra Thrift client interface to 19.34.0, matching Cassandra 1.2.0-beta1. This doesn’t affect any existing Thrift methods, only adds new ones (that aren’t yet utilized by pycassa), so there should not be any breakage.

Bug Fixes

  • Fix single-component composite packing
  • Avoid cyclic imports during installation in setup.py

Other

  • Travis CI integration

Changes in Version 1.7.1

This release has few changes, and should make for a smooth upgrade from 1.7.0.

Features

Bug Fixes

  • Fix bad slice ends when using xget() with composite columns and a column_finish parameter
  • Fix bad documentation paths in debian packaging scripts

Other

  • Add __version__ and __version_info__ attributes to the pycassa module

Changes in Version 1.7.0

This release has a few relatively large changes in it: a new connection pool stats collector, compatibility with Cassandra 0.7 through 1.1, and a change in timezone behavior for datetimes.

Before upgrading, take special care to make sure datetimes that you pass to pycassa (for TimeUUIDType or DateType data) are in UTC, and make sure your code expects to get UTC datetimes back in return.

Likewise, the SystemManager changes should be backwards compatible, but there may be minor differences, mostly in create_column_family() and alter_column_family(). Be sure to test any code that works programmatically with these.

Features

  • Added StatsLogger for tracking ConnectionPool metrics
  • Full Cassandra 1.1 compatibility in SystemManager. To support this, all column family or keyspace attributes that have existed since Cassandra 0.7 may be used as keyword arguments for create_column_family() and alter_column_family(). It is up to the user to know which attributes are available and valid for their version of Cassandra. As part of this change, the version-specific thrift-generated cassandra modules (pycassa.cassandra.c07, pycassa.cassandra.c08, and pycassa.cassandra.c10) have been replaced by pycassa.cassandra. A minor related change is that individual connections now now longer ask for the node’s API version, and that information is no longer stored as an attribute of the ConnectionWrapper.

Bug Fixes

  • Fix xget() paging for non-string comparators
  • Add batch_insert() to ColumnFamilyMap
  • Use setattr instead of directly updating the object’s __dict__ in ColumnFamilyMap to avoid breaking descriptors
  • Fix single-column counter increments with ColumnFamily.insert()
  • Include AuthenticationException and AuthorizationException in the pycassa module
  • Support counters in xget()
  • Sort column families in pycassaShell for display
  • Raise TypeError when bad keyword arguments are used when creating a ColumnFamily object

Other

All datetime objects create by pycassa now use UTC as their timezone rather than the local timezone. Likewise, naive datetime objects that are passed to pycassa are now assumed to be in UTC time, but tz_info is respected if set.

Specifically, the types of data that you may need to make adjustments for when upgrading are TimeUUIDType and DateType (including OldPycassaDateType and IntermediateDateType).

Changes in Version 1.6.0

This release adds a few minor features and several important bug fixes.

The most important change to take note of if you are using composite comparators is the change to the default inclusive/exclusive behavior for slice ends.

Other than that, this should be a smooth upgrade from 1.5.x.

Features

  • New script for easily building RPM packages
  • Add request and parameter information to PoolListener callback
  • Add ColumnFamily.xget(), a generator version of get() that automatically pages over columns in reasonably sized chunks
  • Add support for Int32Type, a 4-byte signed integer format
  • Add constants for the highest and lowest possible TimeUUID values to pycassa.util

Bug Fixes

  • Various 2.4 syntax errors
  • Raise AllServersUnavailable if server_list is empty
  • Handle custom types inside of composites
  • Don’t erase comment when updating column families
  • Match Cassandra’s sorting of TimeUUIDType values when the timestamps tie. This could result in some columns being erroneously left off of the end of column slices when datetime objects or timestamps were used for column_start or column_finish
  • Use gevent’s queue in place of the stdlib version when gevent monkeypatching has been applied
  • Avoid sub-microsecond loss of precision with TimeUUID timestamps when using pycassa.util.convert_time_to_uuid()
  • Make default slice ends inclusive when using CompositeType comparator Previously, the end of the slice was exclusive by default (as was the start of the slice when column_reversed was True)

Changes in Version 1.5.1

This release only affects those of you using DateType data, which has been supported since pycassa 1.2.0. If you are using DateType, it is very important that you read this closely.

DateType data is internally stored as an 8 byte integer timestamp. Since version 1.2.0 of pycassa, the timestamp stored has counted the number of microseconds since the unix epoch. The actual format that Cassandra standardizes on is milliseconds since the epoch.

If you are only using pycassa, you probably won’t have noticed any problems with this. However, if you try to use cassandra-cli, sstable2json, Hector, or any other client that supports DateType, DateType data written by pycassa will appear to be far in the future. Similarly, DateType data written by other clients will appear to be in the past when loaded by pycassa.

This release changes the default DateType behavior to comply with the standard, millisecond-based format. If you use DateType, and you upgrade to this release without making any modifications, you will have problems. Unfortunately, this is a bit of a tricky situation to resolve, but the appropriate actions to take are detailed below.

To temporarily continue using the old behavior, a new class has been created: pycassa.types.OldPycassaDateType. This will read and write DateType data exactly the same as pycassa 1.2.0 to 1.5.0 did.

If you want to convert your data to the new format, the other new class, pycassa.types.IntermediateDateType, may be useful. It can read either the new or old format correctly (unless you have used dates close to 1970 with the new format) and will write only the new format. The best case for using this is if you have DateType validated columns that don’t have a secondary index on them.

To tell pycassa to use OldPycassaDateType or IntermediateDateType, use the ColumnFamily attributes that control types: column_name_class, key_validation_class, column_validators, and so on. Here’s an example:

from pycassa.types import OldPycassaDateType, IntermediateDateType
from pycassa.column_family import ColumnFamily
from pycassa.pool import ConnectionPool

pool = ConnectionPool('MyKeyspace', ['192.168.1.1'])

# Our tweet timeline has a comparator_type of DateType
tweet_timeline_cf = ColumnFamily(pool, 'tweets')
tweet_timeline_cf.column_name_class = OldPycassaDateType()

# Our tweet timeline has a comparator_type of DateType
users_cf = ColumnFamily(pool, 'users')
users_cf.column_validators['join_date'] = IntermediateDateType()

If you’re using DateType for the key_validation_class, column names, column values with a secondary index on them, or are using the DateType validated column as a non-indexed part of an index clause with get_indexed_slices() (eg. “where state = ‘TX’ and join_date > 2012”), you need to be more careful about the conversion process, and IntermediateDateType probably isn’t a good choice.

In most of cases, if you want to switch to the new date format, a manual migration script to convert all existing DateType data to the new format will be needed. In particular, if you convert keys, column names, or indexed columns on a live data set, be very careful how you go about it. If you need any assistance or suggestions at all with migrating your data, please feel free to send an email to tyler@datastax.com; I would be glad to help.

Changes in Version 1.5.0

The main change to be aware of for this release is the new no-retry behavior for counter operations. If you have been maintaining a separate connection pool with retries disabled for usage with counters, you may discontinue that practice after upgrading.

Features

  • By default, counter operations will not be retried automatically. This makes it easier to use a single connection pool without worrying about overcounting.

Bug Fixes

  • Don’t remove entire row when an empty list is supplied for the columns parameter of remove() or the batch remove methods.
  • Add python-setuptools to debian build dependencies
  • Batch remove() was not removing subcolumns when the specified supercolumn was 0 or other “falsey” values
  • Don’t request an extra row when reading fewer than buffer_size rows with get_range() or get_indexed_slices().
  • Remove pool_type from logs, which showed up as None in recent versions
  • Logs were erroneously showing the same server for retries of failed operations even when the actual server being queried had changed

Changes in Version 1.4.0

This release is primarily a bugfix release with a couple of minor features and removed deprecated items.

Features

  • Accept column_validation_classes when creating or altering column families with SystemManager
  • Ignore UNREACHABLE nodes when waiting for schema version agreement

Bug Fixes

  • Remove accidental print statement in SystemManager
  • Raise TypeError when unexpected types are used for comparator or validator types when creating or altering a Column Family
  • Fix packing of column values using column-specific validators during batch inserts when the column name is changed by packing
  • Always return timestamps from inserts
  • Fix NameError when timestamps are used where a DateType is expected
  • Fix NameError in python 2.4 when unpacking DateType objects
  • Handle reading composites with trailing components missing
  • Upgrade ez_setup.py to fix broken setuptools link

Removed Deprecated Items

  • pycassa.connect()
  • pycassa.connect_thread_local()
  • ConnectionPool.status()
  • ConnectionPool.recreate()

Changes in Version 1.3.0

This release adds full compatibility with Cassandra 1.0 and removes support for schema manipulation in Cassandra 0.7.

In this release, schema manipulation should work with Cassandra 0.8 and 1.0, but not 0.7. The data API should continue to work with all three versions.

Bug Fixes

  • Don’t ignore columns parameter in ColumnFamilyMap.insert()
  • Handle empty instance fields in ColumnFamilyMap.insert()
  • Use the same default for timeout in pycassa.connect() as ConnectionPool uses
  • Fix typo which caused a different exception to be thrown when an AllServersUnavailable exception was raised
  • IPython 0.11 compatibility in pycassaShell
  • Correct dependency declaration in setup.py
  • Add UUIDType to supported types

Features

  • The filter_empty parameter was added to get_range() with a default of True; this allows empty rows to be kept if desired

Deprecated

  • pycassa.connect()
  • pycassa.connect_thread_local()

Changes in Version 1.2.1

This is strictly a bug-fix release addressing a few issues created in 1.2.0.

Bug Fixes

Changes in Version 1.2.0

This should be a fairly smooth upgrade from pycassa 1.1. The primary changes that may introduce minor incompatibilities are the changes to ColumnFamilyMap and the automatic skipping of “ghost ranges” in ColumnFamily.get_range().

Features

Bug Fixes

  • Add connections to ConnectionPool more readily when prefill is False. Before this change, if the ConnectionPool was created with prefill=False, connections would only be added to the pool when there was concurrent demand for connections. After this change, if prefill=False and pool_size=N, the first N operations will each result in a new connection being added to the pool.
  • Close connection and adjust the ConnectionPool‘s connection count after a TApplicationException. This exception generally indicates programmer error, so it’s not extremely common.
  • Handle typed keys that evaluate to False

Deprecated

  • ConnectionPool.recreate()
  • ConnectionPool.status()

Miscellaneous

Changes in Version 1.1.1

Features

Bug Fixes

  • Don’t retry operations after a TApplicationException. This exception is reserved for programmatic errors (such as a bad API parameters), so retries are not needed.
  • If the read_consistency_level kwarg was used in a ColumnFamily constructor, it would be ignored, resulting in a default read consistency level of ONE. This did not affect the read consistency level if it was specified in any other way, including per-method or by setting the read_consistency_level attribute.

Changes in Version 1.1.0

This release adds compatibility with Cassandra 0.8, including support for counters and key_validation_class. This release is backwards-compatible with Cassandra 0.7, and can support running against a mixed cluster of both Cassandra 0.7 and 0.8.

Other Features

Bug Fixes

There were several related issues with overlow in ConnectionPool:

  • Connection failures when a ConnectionPool was in a state of overflow would not result in adjustment of the overflow counter, eventually leading the ConnectionPool to refuse to create new connections.
  • Settings of -1 for ConnectionPool.overflow erroneously caused overflow to be disabled.
  • If overflow was enabled in conjunction with prefill being disabled, the effective overflow limit was raised to max_overflow + pool_size.

Other

Removed Deprecated Items

The following deprecated items have been removed:

Deprecated

Athough not technically deprecated, most ColumnFamily constructor arguments should instead be set by setting the corresponding attribute on the ColumnFamily after construction. However, all previous constructor arguments will continue to be supported if passed as keyword arguments.

Changes in Version 1.0.8

  • Pack IndexExpression values in get_indexed_slices() that are supplied through the IndexClause instead of just the instance parameter.
  • Column names and values which use Cassandra’s IntegerType are unpacked as though they are in a BigInteger-like format. This is (backwards) compatible with the format that pycassa uses to pack IntegerType data. This fixes an incompatibility with the format that cassandra-cli and other clients use to pack IntegerType data.
  • Restore Python 2.5 compatibility that was broken through out of order keyword arguments in ConnectionWrapper.
  • Pack column_start and column_finish arguments in ColumnFamily *get*() methods when the super_column parameter is used.
  • Issue a DeprecationWarning when a method, parameter, or class that has been deprecated is used. Most of these have been deprecated for several releases, but no warnings were issued until now.
  • Deprecations are now split into separate sections for each release in the changelog.

Deprecated

  • The instance parameter of ColumnFamilyMap.get_indexed_slices()

Changes in Version 1.0.7

  • Catch KeyError in pycassa.columnfamily.ColumnFamily.multiget() empty row removal. If the same non-existent key was passed multiple times, a KeyError was raised when trying to remove it from the OrderedDictionary after the first removal. The KeyError is caught and ignored now.
  • Handle connection failures during retries. When a connection fails, it tries to create a new connection to replace itself. Exceptions during this process were not properly handled; they are now handled and count towards the retry count for the current operation.
  • Close connection when a MaximumRetryException is raised. Normally a connection is closed when an operation it is performing fails, but this was not happening for the final failure that triggers the MaximumRetryException.

Changes in Version 1.0.6

  • Add EOFError to the list of exceptions that cause a connection swap and retry
  • Improved autopacking efficiency for AsciiType, UTF8Type, and BytesType
  • Preserve sub-second timestamp precision in datetime arguments for insertion or slice bounds where a TimeUUID is expected. Previously, precision below a second was lost.
  • In a MaximumRetryException‘s message, include details about the last Exception that caused the MaximumRetryException to be raised
  • pycassa.pool.ConnectionPool.status() now always reports a non-negative overflow; 0 is now used when there is not currently any overflow
  • Created pycassa.types.Long as a replacement for pycassa.types.Int64. Long uses big-endian encoding, which is compatible with Cassandra’s LongType, while Int64 used little-endian encoding.

Deprecated

  • pycassa.types.Int64 has been deprecated in favor of pycassa.types.Long

Changes in Version 1.0.5

  • Assume port 9160 if only a hostname is given
  • Remove super_column param from pycassa.columnfamily.ColumnFamily.get_indexed_slices()
  • Enable failover on functions that previously lacked it
  • Increase base backoff time to 0.01 seconds
  • Add a timeout paremeter to pycassa.system_manager.SystemManger
  • Return timestamp on single-column inserts

Changes in Version 1.0.4

Deprecated

Changes in Version 1.0.3

  • Fixed supercolumn slice bug in get()
  • pycassaShell now runs scripts with execfile to allow for multiline statements
  • 2.4 compatability fixes

Changes in Version 1.0.2

Changes in Version 1.0.1

Changes in Version 1.0.0

  • Created the SystemManager class to allow for keyspace, column family, and index creation, modification, and deletion. These operations are no longer provided by a Connection class.
  • Updated pycassaShell to use the SystemManager class
  • Improved retry behavior, including exponential backoff and proper resetting of the retry attempt counter
  • Condensed connection pooling classes into only pycassa.pool.ConnectionPool to provide a simpler API
  • Changed pycassa.connection.connect() to return a connection pool
  • Use more performant Thrift API methods for insert() and get() where possible
  • Bundled OrderedDict and set it as the default dictionary class for column families
  • Provide better TypeError feedback when columns are the wrong type
  • Use Thrift API 19.4.0

Deprecated

  • ColumnFamilyMap.get_count() has been deprecated. Use ColumnFamily.get_count() instead.

Changes in Version 0.5.4

  • Allow for more backward and forward compatibility
  • Mark a server as being down more quickly in Connection

Changes in Version 0.5.3

  • Added PooledColumnFamily, which makes it easy to use connection pooling automatically with a ColumnFamily.

Changes in Version 0.5.2

  • Support for adding/updating/dropping Keyspaces and CFs in pycassa.connection.Connection
  • get_range() optimization and more configurable batch size
  • batch get_indexed_slices() similar to ColumnFamily.get_range()
  • Reorganized pycassa logging
  • More efficient packing of data types
  • Fix error condition that results in infinite recursion
  • Limit pooling retries to only appropriate exceptions
  • Use Thrift API 19.3.0

Changes in Version 0.5.1

  • Automatically detect if a column family is a standard column family or a super column family
  • multiget_count() support
  • Allow preservation of key order in multiget() if an ordered dictionary is used
  • Convert timestamps to v1 UUIDs where appropriate
  • pycassaShell documentation
  • Use Thrift API 17.1.0

Changes in Version 0.5.0

  • Connection Pooling support: pycassa.pool
  • Started moving logging to pycassa.logger
  • Use Thrift API 14.0.0

Changes in Version 0.4.3

  • Autopack on CF’s default_validation_class
  • Use Thrift API 13.0.0

Changes in Version 0.4.2

  • Added batch mutations interface: pycassa.batch
  • Made bundled thrift-gen code a subpackage of pycassa
  • Don’t attempt to reencode already encoded UTF8 strings

Changes in Version 0.4.1

  • Added batch_insert()
  • Redifined insert() in terms of batch_insert()
  • Fixed UTF8 autopacking
  • Convert datetime slice args to uuids when appropriate
  • Changed how thrift-gen code is bundled
  • Assert that the major version of the thrift API is the same on the client and on the server
  • Use Thrift API 12.0.0

Changes in Version 0.4.0

  • Added pycassaShell, a simple interactive shell
  • Converted the test config from xml to yaml
  • Fixed overflow error on get_count()
  • Only insert columns which exist in the model object
  • Make ColumnFamilyMap not ignore the ColumnFamily’s dict_class
  • Specify keyspace as argument to connect()
  • Add support for framed transport and default to using it
  • Added autopacking for column names and values
  • Added support for secondary indexes with get_indexed_slices() and pycassa.index
  • Added truncate()
  • Use Thrift API 11.0.0

Table Of Contents

Previous topic

pycassa.contrib.stubs – Pycassa Stubs

Next topic

Assorted Cassandra and pycassa Functionality

This Page