API Reference

Core Query Functions

chdb.query(sql, output_format='CSV', path='', udf_path='')[source]

Execute SQL query using chDB engine.

This is the main query function that executes SQL statements using the embedded ClickHouse engine. Supports various output formats and can work with in-memory or file-based databases.

Parameters:
  • sql (str) – SQL query string to execute

  • output_format (str, optional) – Output format for results. Defaults to “CSV”. Supported formats include:

    • “CSV” - Comma-separated values

    • “JSON” - JSON format

    • “Arrow” - Apache Arrow format

    • “Parquet” - Parquet format

    • “DataFrame” - Pandas DataFrame

    • “ArrowTable” - PyArrow Table

    • “Debug” - Enable verbose logging

  • path (str, optional) – Database file path. Defaults to “” (in-memory database). Can be a file path or “:memory:” for in-memory database.

  • udf_path (str, optional) – Path to User-Defined Functions directory. Defaults to “”.

Returns:

Query result in the specified format

  • str: For text formats like CSV, JSON

  • pd.DataFrame: When output_format is “DataFrame” or “dataframe”

  • pa.Table: When output_format is “ArrowTable” or “arrowtable”

  • chdb result object: For other formats

Raises:
  • ChdbError – If the SQL query execution fails

  • ImportError – If required dependencies are missing for DataFrame/Arrow formats

Examples

>>> # Basic CSV query
>>> result = chdb.query("SELECT 1, 'hello'")
>>> print(result)
"1,hello"
>>> # Query with DataFrame output
>>> df = chdb.query("SELECT 1 as id, 'hello' as msg", "DataFrame")
>>> print(df)
   id    msg
0   1  hello
>>> # Query with file-based database
>>> result = chdb.query("CREATE TABLE test (id INT)", path="mydb.chdb")
>>> # Query with UDF
>>> result = chdb.query("SELECT my_udf('test')", udf_path="/path/to/udfs")
chdb.sql(sql, output_format='CSV', path='', udf_path='')

Execute SQL query using chDB engine.

This is the main query function that executes SQL statements using the embedded ClickHouse engine. Supports various output formats and can work with in-memory or file-based databases.

Parameters:
  • sql (str) – SQL query string to execute

  • output_format (str, optional) – Output format for results. Defaults to “CSV”. Supported formats include:

    • “CSV” - Comma-separated values

    • “JSON” - JSON format

    • “Arrow” - Apache Arrow format

    • “Parquet” - Parquet format

    • “DataFrame” - Pandas DataFrame

    • “ArrowTable” - PyArrow Table

    • “Debug” - Enable verbose logging

  • path (str, optional) – Database file path. Defaults to “” (in-memory database). Can be a file path or “:memory:” for in-memory database.

  • udf_path (str, optional) – Path to User-Defined Functions directory. Defaults to “”.

Returns:

Query result in the specified format

  • str: For text formats like CSV, JSON

  • pd.DataFrame: When output_format is “DataFrame” or “dataframe”

  • pa.Table: When output_format is “ArrowTable” or “arrowtable”

  • chdb result object: For other formats

Raises:
  • ChdbError – If the SQL query execution fails

  • ImportError – If required dependencies are missing for DataFrame/Arrow formats

Examples

>>> # Basic CSV query
>>> result = chdb.query("SELECT 1, 'hello'")
>>> print(result)
"1,hello"
>>> # Query with DataFrame output
>>> df = chdb.query("SELECT 1 as id, 'hello' as msg", "DataFrame")
>>> print(df)
   id    msg
0   1  hello
>>> # Query with file-based database
>>> result = chdb.query("CREATE TABLE test (id INT)", path="mydb.chdb")
>>> # Query with UDF
>>> result = chdb.query("SELECT my_udf('test')", udf_path="/path/to/udfs")
chdb.to_arrowTable(res)[source]

Convert query result to PyArrow Table.

Converts a chDB query result to a PyArrow Table for efficient columnar data processing. Returns an empty table if the result is empty.

Parameters:

res – chDB query result object containing binary Arrow data

Returns:

pa.Table – PyArrow Table containing the query results

Raises:

ImportError – If pyarrow or pandas are not installed

Example

>>> result = chdb.query("SELECT 1 as id, 'hello' as msg", "Arrow")
>>> table = chdb.to_arrowTable(result)
>>> print(table.to_pandas())
   id    msg
0   1  hello

Connection and Session

chdb.connect(connection_string: str = ':memory:') Connection[source]

Create a connection to chDB background server.

This function establishes a connection to the chDB (ClickHouse) database engine. Only one open connection is allowed per process. Multiple calls with the same connection string will return the same connection object.

Parameters:

connection_string (str, optional) – Database connection string. Defaults to “:memory:”. Supported connection string formats:

Basic formats:

  • “:memory:” - In-memory database (default)

  • “test.db” - Relative path database file

  • file:test.db” - Same as relative path

  • “/path/to/test.db” - Absolute path database file

  • file:/path/to/test.db” - Same as absolute path

With query parameters:

Query parameter handling:

Query parameters are passed to ClickHouse engine as startup arguments. Special parameter handling:

  • “mode=ro” becomes “–readonly=1” (read-only mode)

  • “verbose” enables verbose logging

  • “log-level=test” sets logging level

For complete parameter list, see clickhouse local --help --verbose

Returns:

Connection – Database connection object that supports:

  • Creating cursors with Connection.cursor()

  • Direct queries with Connection.query()

  • Streaming queries with Connection.send_query()

  • Context manager protocol for automatic cleanup

Raises:

RuntimeError – If connection to database fails

Warning

Only one connection per process is supported. Creating a new connection will close any existing connection.

Examples

>>> # In-memory database
>>> conn = connect()
>>> conn = connect(":memory:")
>>>
>>> # File-based database
>>> conn = connect("my_data.db")
>>> conn = connect("/path/to/data.db")
>>>
>>> # With parameters
>>> conn = connect("data.db?mode=ro")  # Read-only mode
>>> conn = connect(":memory:?verbose&log-level=debug")  # Debug logging
>>>
>>> # Using context manager for automatic cleanup
>>> with connect("data.db") as conn:
...     result = conn.query("SELECT 1")
...     print(result)
>>> # Connection automatically closed

See also

Connection - Database connection class Cursor - Database cursor for DB-API 2.0 operations

See also

Session Management - Complete session management documentation

State Management

chdb.state.connect(connection_string: str = ':memory:') Connection[source]

Create a connection to chDB background server.

This function establishes a connection to the chDB (ClickHouse) database engine. Only one open connection is allowed per process. Multiple calls with the same connection string will return the same connection object.

Parameters:

connection_string (str, optional) – Database connection string. Defaults to “:memory:”. Supported connection string formats:

Basic formats:

  • “:memory:” - In-memory database (default)

  • “test.db” - Relative path database file

  • file:test.db” - Same as relative path

  • “/path/to/test.db” - Absolute path database file

  • file:/path/to/test.db” - Same as absolute path

With query parameters:

Query parameter handling:

Query parameters are passed to ClickHouse engine as startup arguments. Special parameter handling:

  • “mode=ro” becomes “–readonly=1” (read-only mode)

  • “verbose” enables verbose logging

  • “log-level=test” sets logging level

For complete parameter list, see clickhouse local --help --verbose

Returns:

Connection – Database connection object that supports:

  • Creating cursors with Connection.cursor()

  • Direct queries with Connection.query()

  • Streaming queries with Connection.send_query()

  • Context manager protocol for automatic cleanup

Raises:

RuntimeError – If connection to database fails

Warning

Only one connection per process is supported. Creating a new connection will close any existing connection.

Examples

>>> # In-memory database
>>> conn = connect()
>>> conn = connect(":memory:")
>>>
>>> # File-based database
>>> conn = connect("my_data.db")
>>> conn = connect("/path/to/data.db")
>>>
>>> # With parameters
>>> conn = connect("data.db?mode=ro")  # Read-only mode
>>> conn = connect(":memory:?verbose&log-level=debug")  # Debug logging
>>>
>>> # Using context manager for automatic cleanup
>>> with connect("data.db") as conn:
...     result = conn.query("SELECT 1")
...     print(result)
>>> # Connection automatically closed

See also

Connection - Database connection class Cursor - Database cursor for DB-API 2.0 operations

class chdb.state.sqlitelike.Connection(connection_string: str)[source]

Bases: object

close() None[source]

Close the connection and cleanup resources.

This method closes the database connection and cleans up any associated resources including active cursors. After calling this method, the connection becomes invalid and cannot be used for further operations.

Note

This method is idempotent - calling it multiple times is safe.

Warning

Any ongoing streaming queries will be cancelled when the connection is closed. Ensure all important data is processed before closing.

Examples

>>> conn = connect("test.db")
>>> # Use connection for queries
>>> conn.query("CREATE TABLE test (id INT)")
>>> # Close when done
>>> conn.close()
>>> # Using with context manager (automatic cleanup)
>>> with connect("test.db") as conn:
...     conn.query("SELECT 1")
...     # Connection automatically closed
cursor() Cursor[source]

Create a cursor object for executing queries.

This method creates a database cursor that provides the standard DB-API 2.0 interface for executing queries and fetching results. The cursor allows for fine-grained control over query execution and result retrieval.

Returns:

Cursor – A cursor object for database operations

Note

Creating a new cursor will replace any existing cursor associated with this connection. Only one cursor per connection is supported.

Examples

>>> conn = connect(":memory:")
>>> cursor = conn.cursor()
>>> cursor.execute("CREATE TABLE test (id INT, name String)")
>>> cursor.execute("INSERT INTO test VALUES (1, 'Alice')")
>>> cursor.execute("SELECT * FROM test")
>>> rows = cursor.fetchall()
>>> print(rows)
((1, 'Alice'),)

See also

Cursor - Database cursor implementation

query(query: str, format: str = 'CSV') Any[source]

Execute a SQL query and return the complete results.

This method executes a SQL query synchronously and returns the complete result set. It supports various output formats and automatically applies format-specific post-processing.

Parameters:
  • query (str) – SQL query string to execute

  • format (str, optional) – Output format for results. Defaults to “CSV”. Supported formats:

    • “CSV” - Comma-separated values (string)

    • “JSON” - JSON format (string)

    • “Arrow” - Apache Arrow format (bytes)

    • “Dataframe” - Pandas DataFrame (requires pandas)

    • “Arrowtable” - PyArrow Table (requires pyarrow)

Returns:

Query results in the specified format. Type depends on format

  • String formats return str

  • Arrow format returns bytes

  • dataframe format returns pandas.DataFrame

  • arrowtable format returns pyarrow.Table

Raises:
  • RuntimeError – If query execution fails

  • ImportError – If required packages for format are not installed

Warning

This method loads the entire result set into memory. For large results, consider using send_query() for streaming.

Examples

>>> conn = connect(":memory:")
>>>
>>> # Basic CSV query
>>> result = conn.query("SELECT 1 as num, 'hello' as text")
>>> print(result)
num,text
1,hello
>>> # DataFrame format
>>> df = conn.query("SELECT number FROM numbers(5)", "dataframe")
>>> print(df)
   number
0       0
1       1
2       2
3       3
4       4

See also

send_query() - For streaming query execution

send_query(query: str, format: str = 'CSV') StreamingResult[source]

Execute a SQL query and return a streaming result iterator.

This method executes a SQL query and returns a StreamingResult object that allows you to iterate over the results without loading everything into memory at once. This is ideal for processing large result sets.

Parameters:
  • query (str) – SQL query string to execute

  • format (str, optional) – Output format for results. Defaults to “CSV”. Supported formats:

    • “CSV” - Comma-separated values

    • “JSON” - JSON format

    • “Arrow” - Apache Arrow format (enables record_batch() method)

    • “dataframe” - Pandas DataFrame chunks

    • “arrowtable” - PyArrow Table chunks

Returns:

StreamingResult – A streaming iterator for query results that supports:

  • Iterator protocol (for loops)

  • Context manager protocol (with statements)

  • Manual fetching with fetch() method

  • PyArrow RecordBatch streaming (Arrow format only)

Raises:
  • RuntimeError – If query execution fails

  • ImportError – If required packages for format are not installed

Note

Only the “Arrow” format supports the record_batch() method on the returned StreamingResult.

Examples

>>> conn = connect(":memory:")
>>>
>>> # Basic streaming
>>> stream = conn.send_query("SELECT number FROM numbers(1000)")
>>> for chunk in stream:
...     print(f"Processing chunk: {len(chunk)} bytes")
>>> # Using context manager for cleanup
>>> with conn.send_query("SELECT * FROM large_table") as stream:
...     chunk = stream.fetch()
...     while chunk:
...         process_data(chunk)
...         chunk = stream.fetch()
>>> # Arrow format with RecordBatch streaming
>>> stream = conn.send_query("SELECT * FROM data", "Arrow")
>>> reader = stream.record_batch(rows_per_batch=10000)
>>> for batch in reader:
...     print(f"Batch shape: {batch.num_rows} x {batch.num_columns}")

See also

query() - For non-streaming query execution StreamingResult - Streaming result iterator

class chdb.state.sqlitelike.Cursor(connection)[source]

Bases: object

close() None[source]

Close the cursor and cleanup resources.

This method closes the cursor and cleans up any associated resources. After calling this method, the cursor becomes invalid and cannot be used for further operations.

Note

This method is idempotent - calling it multiple times is safe. The cursor is also automatically closed when the connection is closed.

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT 1")
>>> result = cursor.fetchone()
>>> cursor.close()  # Cleanup cursor resources
column_names() list[source]

Return a list of column names from the last executed query.

This method returns the column names from the most recently executed SELECT query. The names are returned in the same order as they appear in the result set.

Returns:

list – List of column name strings, or empty list if no query has been executed or the query returned no columns

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT id, name, email FROM users LIMIT 1")
>>> print(cursor.column_names())
['id', 'name', 'email']

See also

column_types() - Get column type information description - DB-API 2.0 column description

column_types() list[source]

Return a list of column types from the last executed query.

This method returns the ClickHouse column type names from the most recently executed SELECT query. The types are returned in the same order as they appear in the result set.

Returns:

list – List of ClickHouse type name strings, or empty list if no query has been executed or the query returned no columns

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT toInt32(1), toString('hello')")
>>> print(cursor.column_types())
['Int32', 'String']

See also

column_names() - Get column name information description - DB-API 2.0 column description

commit() None[source]

Commit any pending transaction.

This method commits any pending database transaction. In ClickHouse, most operations are auto-committed, but this method is provided for DB-API 2.0 compatibility.

Note

ClickHouse typically auto-commits operations, so explicit commits are usually not necessary. This method is provided for compatibility with standard DB-API 2.0 workflow.

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("INSERT INTO test VALUES (1, 'data')")
>>> cursor.commit()
property description: list

Return column description as per DB-API 2.0 specification.

This property returns a list of 7-item tuples describing each column in the result set of the last executed SELECT query. Each tuple contains: (name, type_code, display_size, internal_size, precision, scale, null_ok)

Currently, only name and type_code are provided, with other fields set to None.

Returns:

list – List of 7-tuples describing each column, or empty list if no SELECT query has been executed

Note

This follows the DB-API 2.0 specification for cursor.description. Only the first two elements (name and type_code) contain meaningful data in this implementation.

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT id, name FROM users LIMIT 1")
>>> for desc in cursor.description:
...     print(f"Column: {desc[0]}, Type: {desc[1]}")
Column: id, Type: Int32
Column: name, Type: String

See also

column_names() - Get just column names column_types() - Get just column types

execute(query: str) None[source]

Execute a SQL query and prepare results for fetching.

This method executes a SQL query and prepares the results for retrieval using the fetch methods. It handles the parsing of result data and automatic type conversion for ClickHouse data types.

Parameters:

query (str) – SQL query string to execute

Raises:

Exception – If query execution fails or result parsing fails

Note

This method follows DB-API 2.0 specifications for cursor.execute(). After execution, use fetchone(), fetchmany(), or fetchall() to retrieve results.

Note

The method automatically converts ClickHouse data types to appropriate Python types:

  • Int/UInt types → int

  • Float types → float

  • String/FixedString → str

  • DateTime → datetime.datetime

  • Date → datetime.date

  • Bool → bool

Examples

>>> cursor = conn.cursor()
>>>
>>> # Execute DDL
>>> cursor.execute("CREATE TABLE test (id INT, name String)")
>>>
>>> # Execute DML
>>> cursor.execute("INSERT INTO test VALUES (1, 'Alice')")
>>>
>>> # Execute SELECT and fetch results
>>> cursor.execute("SELECT * FROM test")
>>> rows = cursor.fetchall()
>>> print(rows)
((1, 'Alice'),)

See also

fetchone() - Fetch single row fetchmany() - Fetch multiple rows fetchall() - Fetch all remaining rows

fetchall() tuple[source]

Fetch all remaining rows from the query result.

This method retrieves all remaining rows from the current query result set starting from the current cursor position. It returns a tuple of row tuples with appropriate Python type conversion applied.

Returns:

tuple – Tuple containing all remaining row tuples from the result set. Returns empty tuple if no rows are available.

Warning

This method loads all remaining rows into memory at once. For large result sets, consider using fetchmany() to process results in batches.

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT id, name FROM users")
>>> all_users = cursor.fetchall()
>>> for user_id, user_name in all_users:
...     print(f"User {user_id}: {user_name}")

See also

fetchone() - Fetch single row fetchmany() - Fetch multiple rows in batches

fetchmany(size: int = 1) tuple[source]

Fetch multiple rows from the query result.

This method retrieves up to ‘size’ rows from the current query result set. It returns a tuple of row tuples, with each row containing column values with appropriate Python type conversion.

Parameters:

size (int, optional) – Maximum number of rows to fetch. Defaults to 1.

Returns:

tuple – Tuple containing up to ‘size’ row tuples. May contain fewer rows if the result set is exhausted.

Note

This method follows DB-API 2.0 specifications. It will return fewer than ‘size’ rows if the result set is exhausted.

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT * FROM large_table")
>>>
>>> # Process results in batches
>>> while True:
...     batch = cursor.fetchmany(100)  # Fetch 100 rows at a time
...     if not batch:
...         break
...     process_batch(batch)

See also

fetchone() - Fetch single row fetchall() - Fetch all remaining rows

fetchone() tuple | None[source]

Fetch the next row from the query result.

This method retrieves the next available row from the current query result set. It returns a tuple containing the column values with appropriate Python type conversion applied.

Returns:

Optional[tuple] – Next row as a tuple of column values, or None if no more rows are available

Note

This method follows DB-API 2.0 specifications. Column values are automatically converted to appropriate Python types based on ClickHouse column types.

Examples

>>> cursor = conn.cursor()
>>> cursor.execute("SELECT id, name FROM users")
>>> row = cursor.fetchone()
>>> while row is not None:
...     user_id, user_name = row
...     print(f"User {user_id}: {user_name}")
...     row = cursor.fetchone()

See also

fetchmany() - Fetch multiple rows fetchall() - Fetch all remaining rows

chdb.state.sqlitelike.to_arrowTable(res)[source]

Convert query result to PyArrow Table.

This function converts chdb query results to a PyArrow Table format, which provides efficient columnar data access and interoperability with other data processing libraries.

Parameters:

res – Query result object from chdb containing Arrow format data

Returns:

pyarrow.Table – PyArrow Table containing the query results

Raises:

ImportError – If pyarrow or pandas packages are not installed

Note

This function requires both pyarrow and pandas to be installed. Install them with: pip install pyarrow pandas

Warning

Empty results return an empty PyArrow Table with no schema.

Examples

>>> import chdb
>>> result = chdb.query("SELECT 1 as num, 'hello' as text", "Arrow")
>>> table = to_arrowTable(result)
>>> print(table.schema)
num: int64
text: string
>>> print(table.to_pandas())
   num   text
0    1  hello

DataFrame Integration

class chdb.dataframe.Table(*args: Any, **kwargs: Any)[source]

Bases:

Database API (DBAPI) Support

See also

DB-API 2.0 Interface - Complete DB-API 2.0 interface documentation

User-Defined Functions (UDF)

See also

User Defined Functions (UDF) - Complete UDF documentation and examples

Exception Handling

class chdb.ChdbError[source]

Bases: Exception

Base exception class for chDB-related errors.

This exception is raised when chDB query execution fails or encounters an error. It inherits from the standard Python Exception class and provides error information from the underlying ClickHouse engine.

The exception message typically contains detailed error information from ClickHouse, including syntax errors, type mismatches, missing tables/columns, and other query execution issues.

Variables:

args – Tuple containing the error message and any additional arguments

Examples

>>> try:
...     result = chdb.query("SELECT * FROM non_existent_table")
... except chdb.ChdbError as e:
...     print(f"Query failed: {e}")
Query failed: Table 'non_existent_table' doesn't exist
>>> try:
...     result = chdb.query("SELECT invalid_syntax FROM")
... except chdb.ChdbError as e:
...     print(f"Syntax error: {e}")
Syntax error: Syntax error near 'FROM'

Note

This exception is automatically raised by chdb.query() and related functions when the underlying ClickHouse engine reports an error. You should catch this exception when handling potentially failing queries to provide appropriate error handling in your application.

Abstract Base Classes

class chdb.rwabc.PyReader(data: Any)[source]

Bases: ABC

abstractmethod read(col_names: List[str], count: int) List[Any][source]

Read a specified number of rows from the given columns and return a list of objects, where each object is a sequence of values for a column.

Parameters:
  • col_names (List[str]) – List of column names to read.

  • count (int) – Maximum number of rows to read.

Returns:

List[Any] – List of sequences, one for each column.

class chdb.rwabc.PyWriter(col_names: List[str], types: List[type], data: Any)[source]

Bases: ABC

abstractmethod finalize() bytes[source]

Assemble and return the final data from blocks. Must be implemented by subclasses.

Returns:

bytes – The final serialized data.

abstractmethod write(col_names: List[str], columns: List[List[Any]]) None[source]

Save columns of data to blocks. Must be implemented by subclasses.

Parameters:
  • col_names (List[str]) – List of column names that are being written.

  • columns (List[List[Any]]) – List of columns data, each column is represented by a list.

Utilities

See also

Utilities - Complete utilities documentation and examples

Version Information

chdb.chdb_version = ('3', '7', '0')

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

chdb.engine_version = 'b._chdb.connect.query.b'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to ‘utf-8’. errors defaults to ‘strict’.

chdb.__version__ = '3.7.0'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to ‘utf-8’. errors defaults to ‘strict’.