pattern.db

The pattern.db module contains wrappers for databases (SQLite, MySQL), Unicode CSV files and Python's datetime. It offers a convenient way to work with tabular data, for example retrieved with the pattern.web module.

Documentation

Database (sqlite + mysql)
Table
Query
Datasheet (csv)
Date

Database

A database is a collection of tables. A table has rows of data with a specific data type (e.g., string, float) for each field or column. A database engine provides an interface to the database, using SQL statements (Structured Query Language). Python 2.5+ comes bundled with the SQLite engine. The MySQL engine requires the MySQL-Python bindings. Note that a 32-bit Python requires a 32-bit MySQL.

The Database() constructor creates (if necessary) and returns an SQLITE or MYSQL database. With SQLITE, it will create a file with the given name in the current folder.

db = Database(
      name, 
      host = 'localhost', 
      port = 3306, 
  username = 'root', 
  password = '', 
      type = SQLITE
)

db.type                     # SQLITE | MYSQL
db.name                     # Database name.
db.host                     # Database host (MySQL).
db.port                     # Database port (MySQL).
db.username                 # Database username (MySQL).
db.password                 # Database password (MySQL).
db.tables                   # Dictionary of (name, Table)-items.
db.relations                # List of relations, see Database.link().
db.query                    # Last executed SQL query.
db.connected                # True after Database.connect().

db.connect()                # Happens automatically.
db.disconnect()

db.create(table, fields=[])
db.remove(table)
db.link(table1, field1, table2, field2, join=LEFT)

db.execute(SQL, commit=False)
db.commit()
db.escape(value)            # "a cat's tail" => "'a cat\'s tail'"

Database.execute() returns an iterator of rows for the given SQL query.
Database.commit() commits the changes of pending INSERT, UPDATE, DELETE queries.
Database.escape() safely quotes and escapes field values.

Create table

Database.create() creates a new table in the database, It takes a table name and a list of row fields, where each field is defined with the field() function. Each field has a name (a-z + underscores) and a type, with an optional default value for new rows. The pk() function can be used for primary keys.

field(name, type=STRING, default=None, index=False, optional=True)

pk(name='id') # field('id', INTEGER, index=PRIMARY, optional=False)

Type	Value	Example
STRING	str, unicode (1-255 characters)	u'Schrödinger'
INTEGER	int	42
FLOAT	float	3.14159
TEXT	str, unicode	open('file.txt').read()
BLOB	str (binary, e.g., PDF, PNG)	db.binary(open('img.jpg', 'rb').read())
BOOLEAN	bool	True, False
DATE	Date	date('1999-12-31 23:59:59')

A STRING field can contain up to a 100 characters. The length (1-255) can be changed by calling STRING as a function, e.g., type=STRING(255). For longer strings, use TEXT. The default value for a DATE field is NOW.

With index=True, the field is indexed for faster search. The index can also be set to UNIQUE (no duplicates) or PRIMARY. A table must have a primary key field that uniquely identifies each row (i.e., an id). Integer primary keys are auto-numbered, there is no need to set the value manually in new rows.

With optional=True, the field is allowed to contain None.

>>> from pattern.db import Database, field, pk, STRING, BOOLEAN, DATE, NOW
>>>  
>>> db = Database('my_stuff')
>>> db.create('pets', fields=(
>>>     pk(),
>>>     field('name', STRING(80), index=True),
>>>     field('type', STRING(20)),
>>>     field('tail', BOOLEAN),
>>>     field('date_birth', DATE, default=None),
>>>     field('date_created', DATE, default=NOW)
>>> ))

>>> db.pets.append(name=u'Schrödinger', type='cat', tail=True)
>>> print db.pets.rows()[0]

(1, u'Schrödinger', u'cat', True, None, Date('2013-12-11 10:09:08'))

Create table from XML

Database.create() can also take a Table.xml or Query.xml. It creates a new table and copies the row data in the given XML string. An optional name parameter can be used to rename the new table. In Query.xml, a field name may contain a period. It will be replaced with an underscore (e.g., pets.name → pets_name). Alternatively, an alias can be defined in the Query.aliases dictionary.

Table

A Table is a list of rows, with one or more fields (i.e., table columns) of a certain type (i.e., string or number). A new table can be created with Database.create(). A TableError is raised if a table with the given name exists. An existing table can be retrieved with Database.tables[name], Database[name] or Database.<name>.

table = Database.tables[name]

table.db                    # Parent Database.
table.name                  # Table name (a-z + underscores).
table.fields                # List of field names (i.e., columns).
table.schema                # Dictionary of (field, Schema)-items.
table.default               # Dictionary of (field, value)-items for new rows.
table.pk                    # Primary key field name.

table.count()               # Total number of rows (len(table) also works).
table.rows()                # List of rows, each a tuple of fields.

table.record(row)           # Dictionary of (field, value)-items for given row.

table.append(fields={}, commit=True)
table.update(id, fields={}, commit=True)
table.remove(id, commit=True)

table.filter(*args, **kwargs)
table.search(*args, **kwargs)

table.xml                   # XML string with the table schema and rows. 
table.datasheet             # Datasheet object (see below).

Table.rows() returns a list of all rows. To iterate rows memory-efficiently, use iter(Table).
Table.append(), update() and remove() modify the table contents.
With commit=False, changes are only committed after Database.commit() (= faster in batch).
Table.filter() returns a subset of rows with a subset of fields.
For example: table.filter('name', type='cat').

Table schema

The Table.schema dictionary contains field name → Schema items.

schema = Table.schema[fieldname]

schema.name                 # Field name.
schema.type                 # STRING, INTEGER, FLOAT, TEXT, BLOB, BOOLEAN, DATE 
schema.length               # STRING field length.
schema.default              # Default value.
schema.index                # PRIMARY | UNIQUE | True | False
schema.optional             # True or False.

>>> from pattern.db import Database
>>>
>>> db = Database('my_stuff')
>>>  
>>> print db.pets.fields
>>> print db.pets.schema['name'].type
>>> print db.pets.schema['name'].length

['id', 'name', 'tail', 'date_birth', 'date_created']
STRING
80

Append row

Table.append() adds a new row with the given field values. It returns the row id, if the table has a primary key generated with pk(). Field values can be given as optional parameters, a dictionary or a tuple. Field values for a BLOB field must be wrapped in Database.binary().

>>> db.pets.append(name=u'Schrödinger', date_birth=date('2009-08-12'))

>>> db.pets.append({'name': u'Schrödinger', 'date_birth': date('2009-08-12')})

>>> db.pets.append((u'Schrödinger', 'cat', True, date('2009-08-12')) # in-order

Update row

Table.update() updates values in the row with the given primary key. A batch of rows can be updated using a filter, or a chain of filters with any() or all(). In the last example, all rows with type='cat' will have their tail field set to True.

>>> db.pets.update(1, type='cat') # set type='cat' in row with id=1.

>>> db.pets.update(1, {'type': 'cat'})

>>> db.pets.update(eq('type', 'cat'), tail=True)

Remove row

Table.remove() removes the row with the given primary key:

>>> db.pets.remove(1)

>>> db.pets.remove(ALL)

>>> db.pets.remove(all(eq('type', 'cat'), lt(year('date_birth'), 1990, '<')))

The last example removes all rows that have type='cat' AND year of birth before 1990.

Filter rows

Table.filter() returns a list of rows filtered by field value(s), where each row is a tuple of fields. The first parameter defines which fields to return. It can be a single field name, a list of field names or ALL. The following parameters are optional and define field constraints. They can also be given as a dictionary:

>>> db.pets.filter('name')                     # all rows, name

>>> db.pets.filter(('id', 'name'))             # all rows, name + id

>>> db.pets.filter(ALL, type='cat')            # type='cat', all fields

>>> db.pets.filter(ALL, type=('cat', 'dog'))   # type='cat' OR type='dog'

>>> db.pets.filter(ALL, type='*at')            # type='cat' OR 'hat' OR 'brat', ...

>>> db.pets.filter(ALL, type='cat', tail=True) # type='cat' AND tail=True

>>> db.pets.filter('id', {'type': 'cat', 'tail': True})

More complex queries can be constructed with a Query.

Query

Table.search() returns a new Query with options for filtering, sorting and ordering rows by field value(s). It can include fields from other, related tables.

query = Table.search( 
     fields = ALL,
    filters = [],
  relations = [],
       sort = None,
      order = ASCENDING,
      group = None,
   function = FIRST,
      range = None
)

query.table                 # Parent Table.
query.fields                # Field name, list of field names, or ALL.
query.aliases               # Dictionary of (field name, alias)-items.
query.filters               # List of filter() objects.
query.relations             # List of rel() objects.
query.sort                  # Field name or list of field names.
query.order                 # ASCENDING | DESCENDING
query.group                 # Field name or list of field names.
query.function              # FIRST, LAST, COUNT, MIN, MAX, SUM, AVG, CONCATENATE
query.range                 # (start, stop)-tuple, e.g. rows 11-20.

query.sql()                 # SQL string, can be used with Database.execute().

query.rows()                # List of rows, each a tuple of fields.

query.record(row)           # Dictionary of (field, value)-items for given row.

query.xml                   # XML string with the query schema and rows.

To iterate rows memory-efficiently, use iter(Query) instead of Query.rows().

Query filter

The filter() function creates a field-value constraint that matches certain rows in a table. A list of filters can be passed to the filters parameter of a Query.

filter(field, value, comparison='=')

Comparison	Description	Example	Alias
=	equal to	filter('type', ('cat', 'dog'), '=')	eq()
i=	equal to (case-insensitive)	filter('name', 'tig*', 'i=')	eqi()
!=	not equal to	filter('name', '*y', '!=')	ne()
>	greater than	filter('weight', 10, '>')	gt()
<	less than	filter('weight', 10, '<')	lt()
>=	greater than or equal to	filter(year('date'), 1999, '>=')	gte()
<=	less than or equal to	filter(year('date'), 2002, '<=')	lte()
:	between (inclusive)	filter(year('date'), (1999, 2002), ':')	rng()

The field name of a DATE field can be passed to the year(), month(), day(), hour(), minute() or second() function.The short aliases of filter() have a preset comparison operator.

Query filter chain

Filters can be chained together. The all() function returns a list with AND logic. The any() function returns a list with OR logic. In the example below, the first query matches all cats named Taxi. The second and third query match any pet that is cat OR that is named Taxi.

all(filter1, filter2, ...) # Rows must match ALL of the filters.

any(filter1, filter2, ...) # Rows must match ANY of the filters.

>>> from pattern.db import Database, eq, all, any
>>>  
>>> db = Database('my_stuff')
>>>  
>>> db.pets.search(filters=all(eq('name', 'Taxi'), eq('type', 'cat')))
>>> db.pets.search(filters=any(eq('name', 'Taxi'), eq('type', 'cat')))
>>> db.pets.search(filters=any(name='Taxi', type='cat'))

Lists created with all() and any() can be nested to define complex search criteria. The example below matches all pets that are cats, and whose name starts with Fluff- OR ends with a -y:

>>> f = any(eq('name', 'Fluff*'), eq('name', '*y')) # OR
>>> f = all(eq('type', 'cat'), f)                   # AND
>>>
>>> for row in db.pets.search(filters=f):  
>>>     print row

The syntax can even be more concise:

>>> for row in db.pets.search(filters=all(name=('Fluff*', '*y'), type='cat')):
>>>     print row

Query relation

The rel() function defines a relation between two fields in different tables (usually id's).

rel(field1, field2, table, join=LEFT) # LEFT | INNER

The optional join parameter defines how rows are matched. LEFT takes all rows from the base table, with additional fields from the related table. For a row with no match between field1 and field2, these fields have value None. INNER takes the subset of rows that have a match between field1 and field2.

A well-known example is a database app that processes invoices. Say we have a products table and an orders table. Each order has a product id – instead of all product details. Each product id can occur in multiple orders. This approach is called database normalization. It avoids duplicate data. To generate an invoice, we can combine product details and order details using a query relation.

The following example demonstrates a simple products + customers + orders database app:

products
id	name	price
1	pizza	15
2	garlic bread	3

customers
id	name
1	Schrödinger
2	Hofstadter

orders
id	product	customer
1	1	2

>>> from pattern.db import Database, field, pk, INTEGER as I
>>>  
>>> db = Database('pizza_delivery')
>>> 
>>> db.create( 'products', (pk(), field('name'), field('price', I)))
>>> db.create('customers', (pk(), field('name')))
>>> db.create(   'orders', (pk(), field('product', I), field('customer', I)))

Add products and customers. Pizza delivery is open for business!

>>> db.products.append(name='pizza', price=15)
>>> db.products.append(name='garlic bread', price=3)

>>> db.customers.append(name=u'Schrödinger')
>>> db.customers.append(name=u'Hofstadter')

Hofstadter orders a pizza.

>>> db.orders.append(product=1, customer=2)

An orders query with relations to products and customers generates a human-readable invoice:

>>> from pattern.db import Database, rel
>>>  
>>> db = Database('pizza_delivery')
>>>
>>> f = ('orders.id', 'customers.name', 'products.name', 'products.price')
>>> q = db.orders.search(f, relations=(
>>>      rel('orders.customer', 'customers.id', 'customers'),
>>>      rel('orders.product', 'products.id', 'products'))
>>> )
>>> for row in q:
>>>     print q.record(row)

{ 'orders.id'      : 1, 
  'customers.name' : u'Hofstadter',
  'products.name'  : u'pizza', 
  'products.price' : 15 }

If a relation is used repeatedly, define it once with Database.link(). It will be available in every Query.

Grouping rows

A Query has an optional parameter group that can be used to merge rows on duplicate field values. The given function is applied to the other fields. It can also be a list with a function for each field.

Function	Field type	Description
FIRST	any	The first row field in the group.
LAST	any	The last row field in the group.
COUNT	any	The number of rows in the group.
MIN	INTEGER + FLOAT	The lowest field value in the group.
MAX	INTEGER + FLOAT	The highest field value in the group.
SUM	INTEGER + FLOAT	The sum of all field values in the group.
AVG	INTEGER + FLOAT	The average of all field values in the group.
STDEV	INTEGER + FLOAT	The standard deviation (= variation from average).
CONCATENATE	STRING	Joins all field values with a comma.

For example, to get the total revenue per ordered product:

>>> print db.orders.search(
>>>        fields = ('products.name', 'products.price'),
>>>     relations = rel('product', 'products.id', 'products'),
>>>         group = 'products.name', # Merge orders with same product name.
>>>      function = SUM              # Sum of product prices.
>>> ).rows()

Datasheet

A Datasheet is a matrix of rows and columns, where each row and column can be retrieved as a list. The data can be imported or exported as a CSV-file. Optionally, the given fields is a list of (name, type) headers, where type can be STRING, TEXT, INTEGER, FLOAT, BOOLEAN, BLOB or DATE.

datasheet = Datasheet(rows=[], fields=None)

datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)

datasheet.rows              # List of rows (each row = list of values).
datasheet.columns           # List of columns (each column = list of values).
datasheet.fields            # List of (name, type) column headers.
datasheet.<field>           # List of column values.

datasheet[i]                # Row at index i.
datasheet[i, j]             # Value in row i at column j.
datasheet[i1:i2, j]         # Slice of column j from rows i1-i2.
datasheet[i, j1:j2]         # Slice of columns j1-j2 from row i.
datasheet[i1:i2, j1:j2]     # Datasheet with columns j1-j2 from rows i1-i2.
datasheet[:]                # Datasheet copy.

datasheet.insert(i, row, default=None)
datasheet.append(row, default=None)
datasheet.extend(rows, default=None)
datasheet.copy(rows=ALL, columns=ALL)

datasheet.group(j, function=FIRST, key=lambda v: v)

datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)

datasheet.json              # JSON-formatted string.

Datasheet.insert() and append() fill missing columns with the default value.
Datasheet.columns.insert() and append() fill missing rows with the default value.
An optional field parameter can be used to supply a (name, type) column header.
Datasheet.copy() returns a new Datasheet from a selective list of row and/or column indices.
To rotate a datasheet 90 degrees, use datasheet = flip(datasheet).

For example:

>>> from pattern.db import Datasheet
>>> 
>>> ds = Datasheet()
>>> ds.append((u'Schrödinger', 'cat'))
>>> ds.append((u'Hofstadter', 'cat'))
>>> ds.save('pets.csv')
>>> 
>>> ds = Datasheet.load('pets.csv')
>>> print ds

[[u'Schrödinger', 'cat'],
 [ u'Hofstadter', 'cat']]

Grouping rows

Datasheet.group(j) returns a new Datasheet with unique values in column j. It merges rows using a given function that takes a list of column values and returns a single value. Predefined functions are FIRST, LAST, COUNT, MIN, MAX, SUM, AVG, STDEV and CONCATENATE. It can also be a list of functions.

The optional key can be used to compare the values in column j. For example, lambda date: date.year groups a column of Date objects by year.

>>> from pattern.db import Datasheet, pprint
>>> 
>>> ds = Datasheet(rows=[
>>>     (1, u'Schrödinger', 'cat'),
>>>     (2, u'Hofstadter',  'cat'),
>>>     (3, u'Taxi',        'dog')
>>> ])
>>> 
>>> g = ds.copy(columns=[2, 0]) # A copy with type & id.
>>> g = g.group(0, COUNT)       # Group type, count rows per type.
>>> pprint(g, fill='')

cat  2 
dog  1

Sorting rows & columns

Datasheet.columns[j].sort() sorts the rows according to the values in column j.
Datasheet.columns.sort() can be used to change the column order:

>>> ds.columns.sort(order=[0, 2, 1])
>>> pprint(ds, fill='')

1  cat  Schrödinger
2  cat  Hofstadter
3  dog  Taxi

Datasheet.columns.swap(j1,j2) swaps two individual columns with given indices.

CSV import & export

Datasheet.save() exports the matrix as a CSV file. Datasheet.load() returns a Datasheet from a given CSV file. CSV (comma-separated values) is a simple text format for tabular data, where each line is a row and each value is separated by a comma.

datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)

datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)

On export, all str, int, float, bool and Date values are converted to Unicode. An encoder can be given for other data types. On import, all values in the datasheet will be Unicode unless a decoder is given.

With headers=True, the Datasheet.fields headers are exported and imported (first line in CSV). In this case, the data type for each column (STRING, INTEGER, FLOAT, BOOLEAN or DATE) is explicitly known and no encoder or decoder is needed.

>>> from pattern.db import Datasheet, STRING, DATE, date
>>> 
>>> ds = Datasheet(fields=(('name', STRING), ('date', DATE)))
>>> ds.append((u'Schrödinger', date('1887-08-12')))
>>> ds.append((u'Hofstadter', date('1945-02-15')))
>>>  
>>> ds.save('pets.csv', headers=True)
>>> 
>>> ds = Datasheet.load('pets.csv', headers=True)
>>> print ds[0]

[u'Schrödinger', Date('1887-08-12 00:00:00')]

The csv() function can also be used instead of Datasheet.load():

>>> from pattern.db import csv
>>> 
>>> for name, date in csv('pets.csv', separator=',', headers=True):
>>>     print name, date

Date

The date() function returns a new Date, a convenient subclass of Python's datetime.datetime. It takes an integer (Unix timestamp), a string or NOW. An optional string input format and output format can be given (e.g., "%d/%m/%y"). The default output format is "YYYY-MM-DD hh:mm:ss".

d = date(int)

d = date(NOW, format=DEFAULT)

d = date(string)

d = date(string, format=DEFAULT)

d = date(string, inputformat, format=DEFAULT)

d = date(year, month, day, format=DEFAULT)

d = date(year, month, day, hours, minutes, seconds, format=DEFAULT)

d.year
d.month                     # 1-12
d.week                      # 1-52
d.weekday                   # 1-7
d.day                       # 1-31
d.minute                    # 1-60
d.second                    # 1-60
d.timestamp                 # Seconds elapsed since 1/1/1970.

If no string input format is given, a number of common formats will be tried:

Format	Example
%Y-%m-%d %H:%M:%S	2010-09-21 09:27:01
%a, %d %b %Y %H:%M:%S %z	Tue, 9 Sep 2010 17:58:28 +0000
%Y-%m-%dT%H:%M:%SZ	2010-09-20T09:27:01Z
%Y-%m-%dT%H:%M:%S+0000	2010-09-20T09:27:01+0000
%Y-%m-%d %H:%M	2010-09-20 09:27
%Y-%m-%d	2010-09-20
%d/%m/%Y	20/09/2010
%d %B %Y	9 september 2010
%B %d %Y	September 9 2010
%B %d, %Y	September 09, 2010

All date formats used in pattern.web (e.g., Twitter search result) are automatically detected.
For an overview of date format syntax, see: http://docs.python.org/library/time.html#time.strftime.

Date calculations

The time() function can be used to add or subtract time to a Date:

time(days=0, seconds=0, minutes=0, hours=0)

>>> from pattern.db import date, time
>>> 
>>> d  = date('23 august 2011')
>>> d += time(days=2, hours=5)
>>> print type(d)
>>> print d
>>> print d.year, d.month, d.day

<class 'pattern.db.Date'>
2011-08-25 05:00:00
2011, 8, 25

pattern.db

Documentation

Database

Create table

Create table from XML

Table

Table schema

Append row

Update row

Remove row

Filter rows

Query

Query filter

Query filter chain

Query relation

Grouping rows

Datasheet

Grouping rows

Sorting rows & columns

CSV import & export

Date

See also