Changelog
Changes in Version 1.11.0
Features
- Upgrade Thrift interface to 19.36.1, which adds support for the
LOCAL_ONE consistency level and the populate_io_cache_on_flush
column family attribute.
Bug Fixes
- Return timestamp from remove() in stub ColumnFamily
Miscellaneous
- Upgrade bundled ez_setup.py
Changes in Version 1.10.0
This release only adds one feature: support for Cassandra
1.2’s atomic batches.
Changes in Version 1.9.1
This release fixes a few edge cases around connection pooling that
can affect long-running applications. It also adds token range
support to ColumnFamily.get_range(), which can be useful
for parallelizing full-ring scans.
Bug Fixes
- Prevent possible double connection disposal when recycling connections
- Handle empty strings for IntegerType values
- Prevent closed connections from being returned to the pool.
- Ensure connection count is decremented when pool is disposed
Changes in Version 1.9.0
This release adds a couple of minor new features and improves multithreaded
locking behavior in ConnectionPool. There should be no
backwards-compatibility concerns.
Features
- Full support for column_start, column_finish, column_count, and
column_reversed parameters in stubs
- Addition of an include_ttl parameter to ColumnFamily fetching
methods which works like the existing include_timestamp parameter.
Bug Fixes
- Reduce the locked critical section in ConnectionPool, primarily
to make sure lock aquisition time is not ignored outside of the pool’s
timeout setting.
Changes in Version 1.8.0
This release requires either Python 2.6 or 2.7. Python 2.4 and 2.5
are no longer supported. There are no concrete plans for Python 3
compatibility yet.
Bug Fixes
- Don’t return closed connections to the pool. This was primarily a
problem when operations failed after retrying up to the limit,
resulting in a MaximumRetryException or
AllServersUnavailable.
- Set keyspace for connection after logging in instead of before.
This fixes authentication against Cassandra 1.2, which requires
logging in prior to setting a keyspace.
- Specify correct UUID variant when creating v1 uuid.UUID objects
from datetimes or timestamps
- Add 900ns to v1 uuid.UUID timestamps when the “max” TimeUUID for
a specific datetime or timestamp is requested, such as a
column slice end
- Also look at attributes of parent classes when creating
columns from attributes in ColumnFamilyMap
Other
- Upgrade bundled Thrift-generated python to 19.35.0, generated
with Thrift 0.9.0.
Changes in Version 1.7.2
This release fixes a minor bug and upgrades the bundled Cassandra
Thrift client interface to 19.34.0, matching Cassandra 1.2.0-beta1.
This doesn’t affect any existing Thrift methods, only adds new ones
(that aren’t yet utilized by pycassa), so there should not be any
breakage.
Bug Fixes
- Fix single-component composite packing
- Avoid cyclic imports during installation in setup.py
Changes in Version 1.7.1
This release has few changes, and should make for a smooth upgrade
from 1.7.0.
Bug Fixes
- Fix bad slice ends when using xget() with
composite columns and a column_finish parameter
- Fix bad documentation paths in debian packaging scripts
Other
- Add __version__ and __version_info__ attributes to the
pycassa module
Changes in Version 1.7.0
This release has a few relatively large changes in it: a new
connection pool stats collector, compatibility with Cassandra 0.7
through 1.1, and a change in timezone behavior for datetimes.
Before upgrading, take special care to make sure datetimes that you
pass to pycassa (for TimeUUIDType or DateType data) are in UTC, and
make sure your code expects to get UTC datetimes back in return.
Likewise, the SystemManager changes should be backwards compatible,
but there may be minor differences, mostly in
create_column_family() and
alter_column_family(). Be sure to test any code
that works programmatically with these.
Features
- Added StatsLogger for tracking ConnectionPool
metrics
- Full Cassandra 1.1 compatibility in SystemManager. To support
this, all column family or keyspace attributes that have existed since
Cassandra 0.7 may be used as keyword arguments for
create_column_family() and
alter_column_family(). It is up to the user to
know which attributes are available and valid for their version of
Cassandra.
As part of this change, the version-specific thrift-generated cassandra
modules (pycassa.cassandra.c07, pycassa.cassandra.c08, and
pycassa.cassandra.c10) have been replaced by pycassa.cassandra.
A minor related change is that individual connections now
now longer ask for the node’s API version, and that information is
no longer stored as an attribute of the ConnectionWrapper.
Bug Fixes
- Fix xget() paging for non-string comparators
- Add batch_insert() to ColumnFamilyMap
- Use setattr instead of directly updating the object’s __dict__ in
ColumnFamilyMap to avoid breaking descriptors
- Fix single-column counter increments with ColumnFamily.insert()
- Include AuthenticationException and AuthorizationException in
the pycassa module
- Support counters in xget()
- Sort column families in pycassaShell for display
- Raise TypeError when bad keyword arguments are used when creating
a ColumnFamily object
Other
All datetime objects create by pycassa now use UTC as their timezone
rather than the local timezone. Likewise, naive datetime objects that are
passed to pycassa are now assumed to be in UTC time, but tz_info is respected
if set.
Specifically, the types of data that you may need to make adjustments for
when upgrading are TimeUUIDType and DateType (including OldPycassaDateType
and IntermediateDateType).
Changes in Version 1.6.0
This release adds a few minor features and several important bug fixes.
The most important change to take note of if you are using composite
comparators is the change to the default inclusive/exclusive behavior
for slice ends.
Other than that, this should be a smooth upgrade from 1.5.x.
Features
- New script for easily building RPM packages
- Add request and parameter information to PoolListener callback
- Add ColumnFamily.xget(), a generator version of
get() that automatically pages over columns
in reasonably sized chunks
- Add support for Int32Type, a 4-byte signed integer format
- Add constants for the highest and lowest possible TimeUUID values
to pycassa.util
Bug Fixes
- Various 2.4 syntax errors
- Raise AllServersUnavailable if server_list is empty
- Handle custom types inside of composites
- Don’t erase comment when updating column families
- Match Cassandra’s sorting of TimeUUIDType values when the timestamps
tie. This could result in some columns being erroneously left off of
the end of column slices when datetime objects or timestamps were used
for column_start or column_finish
- Use gevent’s queue in place of the stdlib version when gevent monkeypatching
has been applied
- Avoid sub-microsecond loss of precision with TimeUUID timestamps when using
pycassa.util.convert_time_to_uuid()
- Make default slice ends inclusive when using CompositeType comparator
Previously, the end of the slice was exclusive by default (as was the start
of the slice when column_reversed was True)
Changes in Version 1.5.1
This release only affects those of you using DateType data,
which has been supported since pycassa 1.2.0. If you are
using DateType, it is very important that you read this
closely.
DateType data is internally stored as an 8 byte integer timestamp.
Since version 1.2.0 of pycassa, the timestamp stored has counted
the number of microseconds since the unix epoch. The actual
format that Cassandra standardizes on is milliseconds since the
epoch.
If you are only using pycassa, you probably won’t have noticed any
problems with this. However, if you try to use cassandra-cli,
sstable2json, Hector, or any other client that supports DateType,
DateType data written by pycassa will appear to be far in the future.
Similarly, DateType data written by other clients will appear to
be in the past when loaded by pycassa.
This release changes the default DateType behavior to comply with
the standard, millisecond-based format. If you use DateType,
and you upgrade to this release without making any modifications,
you will have problems. Unfortunately, this is a bit of a tricky
situation to resolve, but the appropriate actions to take are detailed
below.
To temporarily continue using the old behavior, a new class
has been created: pycassa.types.OldPycassaDateType.
This will read and write DateType data exactly the same as
pycassa 1.2.0 to 1.5.0 did.
If you want to convert your data to the new format, the other
new class, pycassa.types.IntermediateDateType, may be useful.
It can read either the new or old format correctly (unless you have
used dates close to 1970 with the new format) and will write only
the new format. The best case for using this is if you have DateType
validated columns that don’t have a secondary index on them.
To tell pycassa to use OldPycassaDateType or
IntermediateDateType, use the ColumnFamily
attributes that control types: column_name_class,
key_validation_class,
column_validators, and so on. Here’s an example:
from pycassa.types import OldPycassaDateType, IntermediateDateType
from pycassa.column_family import ColumnFamily
from pycassa.pool import ConnectionPool
pool = ConnectionPool('MyKeyspace', ['192.168.1.1'])
# Our tweet timeline has a comparator_type of DateType
tweet_timeline_cf = ColumnFamily(pool, 'tweets')
tweet_timeline_cf.column_name_class = OldPycassaDateType()
# Our tweet timeline has a comparator_type of DateType
users_cf = ColumnFamily(pool, 'users')
users_cf.column_validators['join_date'] = IntermediateDateType()
If you’re using DateType for the key_validation_class, column names,
column values with a secondary index on them, or are using the DateType
validated column as a non-indexed part of an index clause with
get_indexed_slices() (eg. “where state = ‘TX’ and join_date > 2012”),
you need to be more careful about the conversion process, and
IntermediateDateType probably isn’t a good choice.
In most of cases, if you want to switch to the new date format,
a manual migration script to convert all existing DateType
data to the new format will be needed. In particular, if you
convert keys, column names, or indexed columns on a live data set,
be very careful how you go about it. If you need any assistance or
suggestions at all with migrating your data, please feel free to
send an email to tyler@datastax.com; I would be glad to help.
Changes in Version 1.5.0
The main change to be aware of for this release is the
new no-retry behavior for counter operations. If you have been
maintaining a separate connection pool with retries disabled
for usage with counters, you may discontinue that practice
after upgrading.
Features
- By default, counter operations will not be retried
automatically. This makes it easier to use a single
connection pool without worrying about overcounting.
Bug Fixes
- Don’t remove entire row when an empty list is supplied for the
columns parameter of remove() or the
batch remove methods.
- Add python-setuptools to debian build dependencies
- Batch remove() was not removing subcolumns
when the specified supercolumn was 0 or other “falsey” values
- Don’t request an extra row when reading fewer than buffer_size
rows with get_range() or
get_indexed_slices().
- Remove pool_type from logs, which showed up as None in
recent versions
- Logs were erroneously showing the same server for retries
of failed operations even when the actual server being
queried had changed
Changes in Version 1.4.0
This release is primarily a bugfix release with a couple
of minor features and removed deprecated items.
Features
- Accept column_validation_classes when creating or altering
column families with SystemManager
- Ignore UNREACHABLE nodes when waiting for schema version
agreement
Bug Fixes
- Remove accidental print statement in SystemManager
- Raise TypeError when unexpected types are used for
comparator or validator types when creating or altering
a Column Family
- Fix packing of column values using column-specific validators
during batch inserts when the column name is changed by packing
- Always return timestamps from inserts
- Fix NameError when timestamps are used where a DateType is
expected
- Fix NameError in python 2.4 when unpacking DateType objects
- Handle reading composites with trailing components missing
- Upgrade ez_setup.py to fix broken setuptools link
Removed Deprecated Items
- pycassa.connect()
- pycassa.connect_thread_local()
- ConnectionPool.status()
- ConnectionPool.recreate()
Changes in Version 1.3.0
This release adds full compatibility with Cassandra 1.0 and
removes support for schema manipulation in Cassandra 0.7.
In this release, schema manipulation should work with Cassandra 0.8
and 1.0, but not 0.7. The data API should continue to work with all
three versions.
Bug Fixes
- Don’t ignore columns parameter in ColumnFamilyMap.insert()
- Handle empty instance fields in ColumnFamilyMap.insert()
- Use the same default for timeout in pycassa.connect() as
ConnectionPool uses
- Fix typo which caused a different exception to be thrown when an
AllServersUnavailable exception was raised
- IPython 0.11 compatibility in pycassaShell
- Correct dependency declaration in setup.py
- Add UUIDType to supported types
Features
- The filter_empty parameter was added to
get_range() with a default of True; this
allows empty rows to be kept if desired
Deprecated
- pycassa.connect()
- pycassa.connect_thread_local()
Changes in Version 1.2.1
This is strictly a bug-fix release addressing a few
issues created in 1.2.0.
Changes in Version 1.2.0
This should be a fairly smooth upgrade from pycassa 1.1. The
primary changes that may introduce minor incompatibilities are
the changes to ColumnFamilyMap and the automatic
skipping of “ghost ranges” in ColumnFamily.get_range().
Features
- Add ConnectionPool.fill()
- Add FloatType, DoubleType,
DateType, and BooleanType support.
- Add CompositeType support for static composites.
See Composite Types for more details.
- Add timestamp, ttl to ColumnFamilyMap.insert()
params
- Support variable-length integers with IntegerType.
This allows more space-efficient small integers as well as
integers that exceed the size of a long.
- Make ColumnFamilyMap a subclass of
ColumnFamily instead of using one as a component.
This allows all of the normal adjustments normally done
to a ColumnFamily to be done to a ColumnFamilyMap
instead. See Class Mapping with Column Family Map for examples of
using the new version.
- Expose the following ConnectionPool attributes,
allowing them to be altered after creation:
max_overflow, pool_timeout,
recycle, max_retries,
and logging_name.
Previously, these were all supplied as constructor arguments.
Now, the preferred way to set them is to alter the attributes
after creation. (However, they may still be set in the
constructor by using keyword arguments.)
- Automatically skip “ghost ranges” in ColumnFamily.get_range().
Rows without any columns will not be returned by the generator,
and these rows will not count towards the supplied row_count.
Bug Fixes
- Add connections to ConnectionPool more readily
when prefill is False.
Before this change, if the ConnectionPool was created with
prefill=False, connections would only be added to the pool
when there was concurrent demand for connections.
After this change, if prefill=False and pool_size=N, the
first N operations will each result in a new connection
being added to the pool.
- Close connection and adjust the ConnectionPool‘s
connection count after a TApplicationException. This
exception generally indicates programmer error, so it’s not
extremely common.
- Handle typed keys that evaluate to False
Deprecated
- ConnectionPool.recreate()
- ConnectionPool.status()
Changes in Version 1.1.1
Bug Fixes
- Don’t retry operations after a TApplicationException. This exception
is reserved for programmatic errors (such as a bad API parameters), so
retries are not needed.
- If the read_consistency_level kwarg was used in a ColumnFamily
constructor, it would be ignored, resulting in a default read consistency
level of ONE. This did not affect the read consistency level if it was
specified in any other way, including per-method or by setting the
read_consistency_level attribute.
Changes in Version 1.1.0
This release adds compatibility with Cassandra 0.8, including support
for counters and key_validation_class. This release is
backwards-compatible with Cassandra 0.7, and can support running against
a mixed cluster of both Cassandra 0.7 and 0.8.
Other Features
- ColumnFamily.multiget() now has a buffer_size parameter
- ColumnFamily.multiget_count() now returns rows
in the order that the keys were passed in, similar to how
multiget() behaves. It also uses
the dict_class attribute for the containing
class instead of always using a dict.
- Autpacking behavior is now more transparent and configurable,
allowing the user to get functionality similar to the CLI’s
assume command, whereby items are packed and unpacked as
though they were a certain data type, even if Cassandra does
not use a matching comparator type or validation class. This
behavior can be controlled through the following attributes:
- A ColumnFamily may reload its schema to handle
changes in validation classes with ColumnFamily.load_schema().
Bug Fixes
There were several related issues with overlow in ConnectionPool:
- Connection failures when a ConnectionPool was in a state
of overflow would not result in adjustment of the overflow counter,
eventually leading the ConnectionPool to refuse to create
new connections.
- Settings of -1 for ConnectionPool.overflow erroneously caused
overflow to be disabled.
- If overflow was enabled in conjunction with prefill being disabled,
the effective overflow limit was raised to max_overflow + pool_size.
Removed Deprecated Items
The following deprecated items have been removed:
Deprecated
Athough not technically deprecated, most ColumnFamily
constructor arguments should instead be set by setting the
corresponding attribute on the ColumnFamily after
construction. However, all previous constructor arguments
will continue to be supported if passed as keyword arguments.
Changes in Version 1.0.8
- Pack IndexExpression values in get_indexed_slices()
that are supplied through the IndexClause instead of just the instance
parameter.
- Column names and values which use Cassandra’s IntegerType are unpacked as though they
are in a BigInteger-like format. This is (backwards) compatible with the format
that pycassa uses to pack IntegerType data. This fixes an incompatibility with
the format that cassandra-cli and other clients use to pack IntegerType data.
- Restore Python 2.5 compatibility that was broken through out of order keyword
arguments in ConnectionWrapper.
- Pack column_start and column_finish arguments in ColumnFamily
*get*() methods when the super_column parameter is used.
- Issue a DeprecationWarning when a method, parameter, or class that
has been deprecated is used. Most of these have been deprecated for several
releases, but no warnings were issued until now.
- Deprecations are now split into separate sections for each release in the
changelog.
Deprecated
- The instance parameter of ColumnFamilyMap.get_indexed_slices()
Changes in Version 1.0.7
- Catch KeyError in pycassa.columnfamily.ColumnFamily.multiget() empty
row removal. If the same non-existent key was passed multiple times, a
KeyError was raised when trying to remove it from the OrderedDictionary
after the first removal. The KeyError is caught and ignored now.
- Handle connection failures during retries. When a connection fails, it tries to
create a new connection to replace itself. Exceptions during this process were
not properly handled; they are now handled and count towards the retry count for
the current operation.
- Close connection when a MaximumRetryException is raised. Normally a connection
is closed when an operation it is performing fails, but this was not happening
for the final failure that triggers the MaximumRetryException.
Changes in Version 1.0.6
- Add EOFError to the list of exceptions that cause a connection swap and retry
- Improved autopacking efficiency for AsciiType, UTF8Type, and BytesType
- Preserve sub-second timestamp precision in datetime arguments for insertion
or slice bounds where a TimeUUID is expected. Previously, precision below a
second was lost.
- In a MaximumRetryException‘s message, include details about the last
Exception that caused the MaximumRetryException to be raised
- pycassa.pool.ConnectionPool.status() now always reports a non-negative
overflow; 0 is now used when there is not currently any overflow
- Created pycassa.types.Long as a replacement for pycassa.types.Int64.
Long uses big-endian encoding, which is compatible with Cassandra’s LongType,
while Int64 used little-endian encoding.
Deprecated
- pycassa.types.Int64 has been deprecated in favor of pycassa.types.Long
Changes in Version 1.0.5
- Assume port 9160 if only a hostname is given
- Remove super_column param from pycassa.columnfamily.ColumnFamily.get_indexed_slices()
- Enable failover on functions that previously lacked it
- Increase base backoff time to 0.01 seconds
- Add a timeout paremeter to pycassa.system_manager.SystemManger
- Return timestamp on single-column inserts
Changes in Version 1.0.3
- Fixed supercolumn slice bug in get()
- pycassaShell now runs scripts with execfile to allow for multiline statements
- 2.4 compatability fixes
Changes in Version 1.0.0
- Created the SystemManager class to
allow for keyspace, column family, and index creation, modification,
and deletion. These operations are no longer provided by a Connection
class.
- Updated pycassaShell to use the SystemManager class
- Improved retry behavior, including exponential backoff and proper
resetting of the retry attempt counter
- Condensed connection pooling classes into only
pycassa.pool.ConnectionPool to provide a simpler API
- Changed pycassa.connection.connect() to return a
connection pool
- Use more performant Thrift API methods for insert()
and get() where possible
- Bundled OrderedDict and set it as the
default dictionary class for column families
- Provide better TypeError feedback when columns are the wrong
type
- Use Thrift API 19.4.0
Deprecated
- ColumnFamilyMap.get_count() has been deprecated. Use
ColumnFamily.get_count() instead.
Changes in Version 0.5.4
- Allow for more backward and forward compatibility
- Mark a server as being down more quickly in
Connection
Changes in Version 0.5.3
- Added PooledColumnFamily, which makes
it easy to use connection pooling automatically with a ColumnFamily.
Changes in Version 0.5.2
- Support for adding/updating/dropping Keyspaces and CFs
in pycassa.connection.Connection
- get_range() optimization
and more configurable batch size
- batch get_indexed_slices()
similar to ColumnFamily.get_range()
- Reorganized pycassa logging
- More efficient packing of data types
- Fix error condition that results in infinite recursion
- Limit pooling retries to only appropriate exceptions
- Use Thrift API 19.3.0
Changes in Version 0.5.1
- Automatically detect if a column family is a standard column family
or a super column family
- multiget_count() support
- Allow preservation of key order in
multiget() if an ordered
dictionary is used
- Convert timestamps to v1 UUIDs where appropriate
- pycassaShell documentation
- Use Thrift API 17.1.0
Changes in Version 0.5.0
- Connection Pooling support: pycassa.pool
- Started moving logging to pycassa.logger
- Use Thrift API 14.0.0
Changes in Version 0.4.3
- Autopack on CF’s default_validation_class
- Use Thrift API 13.0.0
Changes in Version 0.4.2
- Added batch mutations interface: pycassa.batch
- Made bundled thrift-gen code a subpackage of pycassa
- Don’t attempt to reencode already encoded UTF8 strings
Changes in Version 0.4.1
- Added batch_insert()
- Redifined insert()
in terms of batch_insert()
- Fixed UTF8 autopacking
- Convert datetime slice args to uuids when appropriate
- Changed how thrift-gen code is bundled
- Assert that the major version of the thrift API is the same on the
client and on the server
- Use Thrift API 12.0.0
Changes in Version 0.4.0
- Added pycassaShell, a simple interactive shell
- Converted the test config from xml to yaml
- Fixed overflow error on
get_count()
- Only insert columns which exist in the model object
- Make ColumnFamilyMap not ignore the ColumnFamily’s dict_class
- Specify keyspace as argument to connect()
- Add support for framed transport and default to using it
- Added autopacking for column names and values
- Added support for secondary indexes with
get_indexed_slices()
and pycassa.index
- Added truncate()
- Use Thrift API 11.0.0