DB-API 2.0 Interface¶

chDB provides a Python DB-API 2.0 compatible interface for database connectivity, allowing you to use chDB with tools and frameworks that expect standard database interfaces.

Overview¶

The chDB DB-API 2.0 interface includes:

Connections: Database connection management with connection strings
Cursors: Query execution and result retrieval
Type System: DB-API 2.0 compliant type constants and converters
Error Handling: Standard database exception hierarchy
Thread Safety: Level 1 thread safety (threads may share modules but not connections)

API Reference¶

Core Functions

chdb.dbapi.connect(*args, **kwargs)¶

Initialize a new database connection.

Parameters:: path (str, optional) – Database file path. None for in-memory database.
Raises:: err.Error – If connection cannot be established

chdb.dbapi.get_client_info()[source]

Get client version information.

Returns the chDB client version as a string for MySQLdb compatibility.

Returns:: str – Version string in format ‘major.minor.patch’

Type Constructors

chdb.dbapi.Binary(x)[source]

Return x as a binary type.

This function converts the input to bytes type for use with binary database fields, following the DB-API 2.0 specification.

Parameters:: x – Input data to convert to binary
Returns:: bytes – The input converted to bytes

Connection Class

class chdb.dbapi.connections.Connection(path=None)[source]¶

Bases: object

DB-API 2.0 compliant connection to chDB database.

This class provides a standard DB-API interface for connecting to and interacting with chDB databases. It supports both in-memory and file-based databases.

The connection manages the underlying chDB engine and provides methods for executing queries, managing transactions (no-op for ClickHouse), and creating cursors.

Parameters:

path (str, optional) – Database file path. If None, uses in-memory database. Can be a file path like ‘database.db’ or None for ‘:memory:’

Variables:

encoding (str) – Character encoding for queries, defaults to ‘utf8’
open (bool) – True if connection is open, False if closed

Examples

>>> # In-memory database
>>> conn = Connection()
>>> cursor = conn.cursor()
>>> cursor.execute("SELECT 1")
>>> result = cursor.fetchall()
>>> conn.close()

>>> # File-based database
>>> conn = Connection('mydata.db')
>>> with conn.cursor() as cur:
...     cur.execute("CREATE TABLE users (id INT, name STRING)")
...     cur.execute("INSERT INTO users VALUES (1, 'Alice')")
>>> conn.close()

>>> # Context manager usage
>>> with Connection() as cur:
...     cur.execute("SELECT version()")
...     version = cur.fetchone()

Note

ClickHouse does not support traditional transactions, so commit() and rollback() operations are no-ops but provided for DB-API compliance.

close()[source]¶

Close the database connection.

Closes the underlying chDB connection and marks this connection as closed. Subsequent operations on this connection will raise an Error.

Raises:: err.Error – If connection is already closed

commit()[source]¶: Commit the current transaction.

Note

This is a no-op for chDB/ClickHouse as it doesn’t support traditional transactions. Provided for DB-API 2.0 compliance.

cursor(cursor=None)[source]¶

Create a new cursor for executing queries.

Parameters:: cursor – Ignored, provided for compatibility
Returns:: Cursor – New cursor object for this connection
Raises:: err.Error – If connection is closed

Example

>>> conn = Connection()
>>> cur = conn.cursor()
>>> cur.execute("SELECT 1")
>>> result = cur.fetchone()

escape(obj, mapping=None)[source]¶

Escape a value for safe inclusion in SQL queries.

Parameters:

obj – Value to escape (string, bytes, number, etc.)
mapping – Optional character mapping for escaping

Returns:

Escaped version of the input suitable for SQL queries

Example

>>> conn = Connection()
>>> safe_value = conn.escape("O'Reilly")
>>> query = f"SELECT * FROM users WHERE name = {safe_value}"

escape_string(s)[source]¶

Escape a string value for SQL queries.

Parameters:: s (str) – String to escape
Returns:: str – Escaped string safe for SQL inclusion

property open¶

Check if the connection is open.

Returns:: bool – True if connection is open, False if closed

query(sql, fmt='CSV')[source]¶

Execute a SQL query directly and return raw results.

This method bypasses the cursor interface and executes queries directly. For standard DB-API usage, prefer using cursor() method.

Parameters:

sql (str or bytes) – SQL query to execute
fmt (str, optional) – Output format. Defaults to “CSV”. Supported formats include “CSV”, “JSON”, “Arrow”, “Parquet”, etc.

Returns:

Query result in the specified format

Raises:

err.InterfaceError – If connection is closed or query fails

Example

>>> conn = Connection()
>>> result = conn.query("SELECT 1, 'hello'", "CSV")
>>> print(result)
"1,hello\n"

property resp¶

Get the last query response.

Returns:: The raw response from the last query() call

Note

This property is updated each time query() is called directly. It does not reflect queries executed through cursors.

rollback()[source]¶: Roll back the current transaction.

Note

This is a no-op for chDB/ClickHouse as it doesn’t support traditional transactions. Provided for DB-API 2.0 compliance.

Cursor Class

class chdb.dbapi.cursors.Cursor(connection)[source]¶

Bases: object

DB-API 2.0 cursor for executing queries and fetching results.

The cursor provides methods for executing SQL statements, managing query results, and navigating through result sets. It supports parameter binding, bulk operations, and follows DB-API 2.0 specifications.

Do not create Cursor instances directly. Use Connection.cursor() instead.

Variables:

description (tuple) – Column metadata for the last query result
rowcount (int) – Number of rows affected by the last query (-1 if unknown)
arraysize (int) – Default number of rows to fetch at once (default: 1)
lastrowid – ID of the last inserted row (if applicable)
max_stmt_length (int) – Maximum statement size for executemany() (default: 1024000)

Examples

>>> conn = Connection()
>>> cur = conn.cursor()
>>> cur.execute("SELECT 1 as id, 'test' as name")
>>> result = cur.fetchone()
>>> print(result)  # (1, 'test')
>>> cur.close()

Note

See DB-API 2.0 Cursor Objects for complete specification details.

callproc(procname, args=())[source]¶

Execute a stored procedure (placeholder implementation).

Parameters:

procname (str) – Name of stored procedure to execute
args (sequence) – Parameters to pass to the procedure

Returns:

sequence – The original args parameter (unmodified)

Note

chDB/ClickHouse does not support stored procedures in the traditional sense. This method is provided for DB-API 2.0 compliance but does not perform any actual operation. Use execute() for all SQL operations.

Compatibility Warning:: This is a placeholder implementation. Traditional stored procedure features like OUT/INOUT parameters, multiple result sets, and server variables are not supported by the underlying ClickHouse engine.

close()[source]¶

Close the cursor and free associated resources.

After closing, the cursor becomes unusable and any operation will raise an exception. Closing a cursor exhausts all remaining data and releases the underlying cursor.

execute(query, args=None)[source]¶

Execute a SQL query with optional parameter binding.

This method executes a single SQL statement with optional parameter substitution. It supports multiple parameter placeholder styles for flexibility.

Parameters:

query (str) – SQL query to execute
args (tuple/list/dict, optional) – Parameters to bind to placeholders

Returns:

int – Number of affected rows (-1 if unknown)

Parameter Styles:

Question mark style: “SELECT * FROM users WHERE id = ?”
Named style: “SELECT * FROM users WHERE name = %(name)s”
Format style: “SELECT * FROM users WHERE age = %s” (legacy)

Examples

>>> # Question mark parameters
>>> cur.execute("SELECT * FROM users WHERE id = ? AND age > ?", (123, 18))
>>>
>>> # Named parameters
>>> cur.execute("SELECT * FROM users WHERE name = %(name)s", {'name': 'Alice'})
>>>
>>> # No parameters
>>> cur.execute("SELECT COUNT(*) FROM users")

Raises:

ProgrammingError – If cursor is closed or query is malformed
InterfaceError – If database error occurs during execution

executemany(query, args)[source]¶

Execute a query multiple times with different parameter sets.

This method efficiently executes the same SQL query multiple times with different parameter values. It’s particularly useful for bulk INSERT operations.

Parameters:

query (str) – SQL query to execute multiple times
args (sequence) – Sequence of parameter tuples/dicts/lists for each execution

Returns:

int – Total number of affected rows across all executions

Examples

>>> # Bulk insert with question mark parameters
>>> users_data = [(1, 'Alice'), (2, 'Bob'), (3, 'Charlie')]
>>> cur.executemany("INSERT INTO users VALUES (?, ?)", users_data)
>>>
>>> # Bulk insert with named parameters
>>> users_data = [
...     {'id': 1, 'name': 'Alice'},
...     {'id': 2, 'name': 'Bob'}
... ]
>>> cur.executemany(
...     "INSERT INTO users VALUES (%(id)s, %(name)s)",
...     users_data
... )

Note

This method improves performance for multiple-row INSERT and UPDATE operations by optimizing the query execution process.

fetchall()[source]¶

Fetch all remaining rows from the query result.

Returns:: list – List of tuples representing all remaining rows
Raises:: ProgrammingError – If execute() has not been called first

Warning

This method can consume large amounts of memory for big result sets. Consider using fetchmany() for large datasets.

Example

>>> cursor.execute("SELECT id, name FROM users")
>>> all_rows = cursor.fetchall()
>>> print(len(all_rows))  # Number of total rows

fetchmany(size=1)[source]¶

Fetch multiple rows from the query result.

Parameters:: size (int, optional) – Number of rows to fetch. Defaults to 1. If not specified, uses cursor.arraysize.
Returns:: list – List of tuples representing the fetched rows
Raises:: ProgrammingError – If execute() has not been called first

Example

>>> cursor.execute("SELECT id, name FROM users")
>>> rows = cursor.fetchmany(3)
>>> print(rows)  # [(1, 'Alice'), (2, 'Bob'), (3, 'Charlie')]

fetchone()[source]¶

Fetch the next row from the query result.

Returns:: tuple or None – Next row as a tuple, or None if no more rows available
Raises:: ProgrammingError – If execute() has not been called first

Example

>>> cursor.execute("SELECT id, name FROM users LIMIT 3")
>>> row = cursor.fetchone()
>>> print(row)  # (1, 'Alice')
>>> row = cursor.fetchone()
>>> print(row)  # (2, 'Bob')

max_stmt_length = 1024000¶

Max statement size which executemany() generates.

Default value is 1024000.

mogrify(query, args=None)[source]¶

Return the exact query string that would be sent to the database.

This method shows the final SQL query after parameter substitution, which is useful for debugging and logging purposes.

Parameters:

query (str) – SQL query with parameter placeholders
args (tuple/list/dict, optional) – Parameters to substitute

Returns:

str – The final SQL query string with parameters substituted

Example

>>> cur.mogrify("SELECT * FROM users WHERE id = ?", (123,))
"SELECT * FROM users WHERE id = 123"

Note

This method follows the extension to DB-API 2.0 used by Psycopg.

nextset()[source]¶

Move to the next result set (not supported).

Returns:: None – Always returns None as multiple result sets are not supported

Note

chDB/ClickHouse does not support multiple result sets from a single query. This method is provided for DB-API 2.0 compliance but always returns None.

setinputsizes(*args)[source]¶

Set input sizes for parameters (no-op implementation).

Parameters:: *args – Parameter size specifications (ignored)

Note

This method does nothing but is required by DB-API 2.0 specification. chDB automatically handles parameter sizing internally.

setoutputsizes(*args)[source]¶

Set output column sizes (no-op implementation).

Parameters:: *args – Column size specifications (ignored)

Note

This method does nothing but is required by DB-API 2.0 specification. chDB automatically handles output sizing internally.

Error Classes

Exception classes for chdb database operations.

This module provides a complete hierarchy of exception classes for handling database-related errors in chdb, following the Python Database API Specification v2.0.

The exception hierarchy is structured as follows:

StandardError
├── Warning
└── Error
    ├── InterfaceError
    └── DatabaseError
        ├── DataError
        ├── OperationalError
        ├── IntegrityError
        ├── InternalError
        ├── ProgrammingError
        └── NotSupportedError

Each exception class represents a specific category of database errors:

Warning: Non-fatal warnings during database operations
InterfaceError: Problems with the database interface itself
DatabaseError: Base class for all database-related errors
DataError: Problems with data processing (invalid values, type errors)
OperationalError: Database operational issues (connectivity, resources)
IntegrityError: Constraint violations (foreign keys, uniqueness)
InternalError: Database internal errors and corruption
ProgrammingError: SQL syntax errors and API misuse
NotSupportedError: Unsupported features or operations

Note

These exception classes are compliant with Python DB API 2.0 specification and provide consistent error handling across different database operations.

Basic Usage¶

Simple Query Example

import chdb.dbapi as dbapi

print("chdb driver version: {0}".format(dbapi.get_client_info()))

# Create connection and cursor
conn = dbapi.connect()
cur = conn.cursor()

# Execute query
cur.execute('SELECT version()')
print("description:", cur.description)
print("data:", cur.fetchone())

# Clean up
cur.close()
conn.close()

Working with Data

import chdb.dbapi as dbapi

conn = dbapi.connect()
cur = conn.cursor()

# Create table
cur.execute("""
    CREATE TABLE employees (
        id UInt32,
        name String,
        department String,
        salary Decimal(10,2)
    ) ENGINE = Memory
""")

# Insert data
cur.execute("""
    INSERT INTO employees VALUES
    (1, 'Alice', 'Engineering', 75000.00),
    (2, 'Bob', 'Marketing', 65000.00),
    (3, 'Charlie', 'Engineering', 80000.00)
""")

# Query data
cur.execute("SELECT * FROM employees WHERE department = 'Engineering'")

# Fetch results
print("Column names:", [desc[0] for desc in cur.description])
for row in cur.fetchall():
    print(row)

conn.close()

Connection Management¶

Connection Strings

import chdb.dbapi as dbapi

# In-memory database (default)
conn1 = dbapi.connect()

# Persistent database file
conn2 = dbapi.connect("./my_database.chdb")

# Connection with parameters
conn3 = dbapi.connect("./my_database.chdb?log-level=debug&verbose")

# Read-only connection
conn4 = dbapi.connect("./my_database.chdb?mode=ro")

Connection Context Manager

# Automatic connection cleanup
with dbapi.connect("test.chdb") as conn:
    cur = conn.cursor()
    cur.execute("SELECT count() FROM numbers(1000)")
    result = cur.fetchone()
    print(f"Count: {result[0]}")
    cur.close()
# Connection automatically closed

Cursor Operations¶

Fetching Results

conn = dbapi.connect()
cur = conn.cursor()

cur.execute("SELECT number, number * 2 as doubled FROM numbers(5)")

# Fetch one row at a time
print("First row:", cur.fetchone())    # (0, 0)

# Fetch multiple rows
print("Next 2 rows:", cur.fetchmany(2))  # ((1, 2), (2, 4))

# Fetch all remaining rows
print("Remaining rows:", cur.fetchall())  # ((3, 6), (4, 8))

conn.close()

Cursor as Iterator

conn = dbapi.connect()
cur = conn.cursor()

cur.execute("SELECT number, toString(number) FROM numbers(3)")

# Iterate over results
for row in cur:
    print(f"Number: {row[0]}, String: {row[1]}")

conn.close()

Column Metadata

conn = dbapi.connect()
cur = conn.cursor()

cur.execute("SELECT 1 as id, 'Alice' as name, 25.5 as score")

# Get column information
print("Column descriptions:")
for desc in cur.description:
    name, type_code, display_size, internal_size, precision, scale, null_ok = desc
    print(f"  {name}: type={type_code}, nullable={null_ok}")

# Get column names and types
print("Column names:", cur.column_names())
print("Column types:", cur.column_types())

conn.close()

Advanced Usage¶

Parameterized Queries

conn = dbapi.connect()
cur = conn.cursor()

# Note: chDB uses format-style parameters
query = "SELECT number FROM numbers(%s) WHERE number > %s"
cur.execute(query, (10, 5))

results = cur.fetchall()
print(f"Found {len(results)} rows")

conn.close()

Working with Large Datasets

conn = dbapi.connect()
cur = conn.cursor()

# Execute query on large dataset
cur.execute("SELECT number, number * number FROM numbers(100000)")

# Process in batches to avoid memory issues
batch_size = 1000
processed = 0

while True:
    rows = cur.fetchmany(batch_size)
    if not rows:
        break

    # Process batch
    for row in rows:
        # Your processing logic here
        processed += 1

    print(f"Processed {processed} rows...")

print(f"Total processed: {processed}")
conn.close()

File Data Processing

conn = dbapi.connect()
cur = conn.cursor()

# Query CSV file directly
cur.execute("""
    SELECT
        column1,
        column2,
        count(*) as count
    FROM file('data.csv', 'CSV')
    GROUP BY column1, column2
    ORDER BY count DESC
""")

print("Top combinations:")
for row in cur.fetchmany(10):
    print(f"  {row[0]}, {row[1]}: {row[2]}")

conn.close()

Type System¶

DB-API 2.0 Type Constants

import chdb.dbapi as dbapi

conn = dbapi.connect()
cur = conn.cursor()

cur.execute("SELECT 'hello' as text, 42 as number, now() as timestamp")

# Check column types using DB-API 2.0 constants
for desc in cur.description:
    col_name, type_code = desc[0], desc[1]

    if type_code == dbapi.STRING:
        print(f"{col_name} is a string type")
    elif type_code == dbapi.NUMBER:
        print(f"{col_name} is a number type")
    elif type_code == dbapi.TIMESTAMP:
        print(f"{col_name} is a timestamp type")

conn.close()

Binary Data Handling

conn = dbapi.connect()
cur = conn.cursor()

# Handle binary data
binary_data = dbapi.Binary(b"Hello, World!")

# In a real scenario, you might store binary data in a blob field
print(f"Binary data: {binary_data}")
print(f"Type: {type(binary_data)}")

conn.close()

Error Handling¶

Database Exceptions

import chdb.dbapi as dbapi

conn = dbapi.connect()
cur = conn.cursor()

try:
    # This will cause an error
    cur.execute("SELECT * FROM non_existent_table")
except dbapi.Error as e:
    print(f"Database error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
finally:
    cur.close()
    conn.close()

Robust Query Execution

def safe_execute(cursor, query, params=None):
    """Execute query with comprehensive error handling"""
    try:
        if params:
            cursor.execute(query, params)
        else:
            cursor.execute(query)
        return True, None
    except dbapi.Error as e:
        return False, f"Database error: {e}"
    except Exception as e:
        return False, f"Unexpected error: {e}"

# Example usage
conn = dbapi.connect()
cur = conn.cursor()

success, error = safe_execute(cur, "SELECT count() FROM numbers(100)")
if success:
    result = cur.fetchone()
    print(f"Count: {result[0]}")
else:
    print(f"Query failed: {error}")

conn.close()

Integration Examples¶

With Pandas

import pandas as pd
import chdb.dbapi as dbapi

conn = dbapi.connect()

# Execute query and convert to DataFrame
query = """
    SELECT
        number as id,
        number * 2 as value,
        number % 3 as category
    FROM numbers(10)
"""

# Note: Direct pandas.read_sql might not work, use manual conversion
cur = conn.cursor()
cur.execute(query)

# Get column names
columns = [desc[0] for desc in cur.description]

# Fetch all data
data = cur.fetchall()

# Create DataFrame
df = pd.DataFrame(data, columns=columns)
print(df)

cur.close()
conn.close()

Custom Data Processing

class DataProcessor:
    def __init__(self, connection_string=None):
        self.conn = dbapi.connect(connection_string)
        self.cur = self.conn.cursor()

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()

    def execute_analysis(self, query):
        """Execute analytical query and return structured results"""
        self.cur.execute(query)

        # Get metadata
        columns = [desc[0] for desc in self.cur.description]

        # Get data
        rows = self.cur.fetchall()

        return {
            'columns': columns,
            'data': rows,
            'row_count': len(rows)
        }

    def close(self):
        if hasattr(self, 'cur') and self.cur:
            self.cur.close()
        if hasattr(self, 'conn') and self.conn:
            self.conn.close()

# Usage
with DataProcessor() as processor:
    results = processor.execute_analysis("""
        SELECT
            toYYYYMM(today() - number) as month,
            number as days_ago,
            number * 100 as metric
        FROM numbers(12)
    """)

    print(f"Analysis complete: {results['row_count']} rows")
    for row in results['data'][:3]:
        print(f"  Month: {row[0]}, Days ago: {row[1]}, Metric: {row[2]}")

Best Practices¶

Connection Management: Always close connections and cursors when done
Context Managers: Use with statements for automatic cleanup
Batch Processing: Use fetchmany() for large result sets
Error Handling: Wrap database operations in try-except blocks
Parameter Binding: Use parameterized queries when possible
Memory Management: Avoid fetchall() for very large datasets

Note

chDB’s DB-API 2.0 interface is compatible with most Python database tools
The interface provides Level 1 thread safety (threads may share modules but not connections)
Connection strings support the same parameters as chDB sessions
All standard DB-API 2.0 exceptions are supported

Warning

Always close cursors and connections to avoid resource leaks
Large result sets should be processed in batches
Parameter binding syntax follows format style: %s

Module Constants¶

API Level and Threading

chdb.dbapi.apilevel = '2.0'¶

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to ‘utf-8’. errors defaults to ‘strict’.

chdb.dbapi.threadsafety = 1¶

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

chdb.dbapi.paramstyle = 'format'¶

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to ‘utf-8’. errors defaults to ‘strict’.

Type Constants

chdb.dbapi.STRING = frozenset({247, 253, 254})¶