Quick Start¶

This guide will get you up and running with chDB in minutes.

Your First Query¶

Let’s start with the simplest example:

import chdb

# Your first query
result = chdb.query("SELECT 1 as id, 'Hello World' as message", "CSV")
print(result)

Output:

1,Hello World

Connection-Based API (Recommended)¶

For better performance and more features, use the connection-based API:

import chdb

# Create a connection
conn = chdb.connect(":memory:")
cur = conn.cursor()

# Execute queries
cur.execute("SELECT number, toString(number) as str FROM system.numbers LIMIT 3")

# Fetch results
for row in cur:
    print(row)

# Clean up
conn.close()

Output Formats¶

chDB supports multiple output formats for different use cases:

CSV (Default)

result = chdb.query("SELECT 1, 'test'", "CSV")
print(result)  # CSV string

DataFrame (Pandas)

import chdb

df = chdb.query("SELECT number, number*2 as doubled FROM numbers(5)", "DataFrame")
print(type(df))  # <class 'pandas.core.frame.DataFrame'>
print(df.head())

Arrow Table

table = chdb.query("SELECT number FROM numbers(1000)", "ArrowTable")
print(type(table))  # <class 'pyarrow.lib.Table'>
print(f"Rows: {len(table)}")

Pretty Format

result = chdb.query("""
    SELECT
        'Alice' as name, 25 as age
    UNION ALL
    SELECT 'Bob', 30
""", "Pretty")
print(result)

Working with Files¶

chDB can query 70+ file formats directly:

CSV Files

# Query a local CSV file
result = chdb.query("""
    SELECT count(*), avg(column_name)
    FROM file('data.csv', 'CSV')
""")

JSON Files

# Query JSON data
result = chdb.query("""
    SELECT * FROM file('data.json', 'JSONEachRow')
    WHERE field > 100
    LIMIT 10
""")

Parquet Files

# Efficient querying of Parquet files
result = chdb.query("""
    SELECT department, sum(salary) as total_salary
    FROM file('employees.parquet', 'Parquet')
    GROUP BY department
    ORDER BY total_salary DESC
""")

DataFrame Integration¶

Query pandas DataFrames directly:

import pandas as pd
import chdb

# Create sample DataFrame
df = pd.DataFrame({
    'product': ['A', 'B', 'C', 'A', 'B'],
    'sales': [100, 200, 150, 300, 250],
    'region': ['North', 'South', 'North', 'South', 'North']
})

# Query the DataFrame using chDB
result = chdb.query("""
    SELECT
        product,
        region,
        sum(sales) as total_sales,
        avg(sales) as avg_sales
    FROM Python(df)
    GROUP BY product, region
    ORDER BY total_sales DESC
""", "DataFrame")

print(result)

Memory vs Persistent Storage¶

In-Memory (Default)

Perfect for data analysis and temporary operations:

# All data stays in memory
result = chdb.query("""
    SELECT number, number^2 as squared
    FROM numbers(1000000)
    WHERE number % 1000 = 0
""")

Persistent Storage

For data that needs to persist between sessions:

# Create a persistent database
conn = chdb.connect("my_database.chdb")
cur = conn.cursor()

# Create and populate table
cur.execute("""
    CREATE TABLE IF NOT EXISTS users (
        id UInt32,
        name String,
        email String
    ) ENGINE = MergeTree() ORDER BY id
""")

cur.execute("INSERT INTO users VALUES (1, 'Alice', 'alice@example.com')")
cur.execute("INSERT INTO users VALUES (2, 'Bob', 'bob@example.com')")

# Query the persistent data
cur.execute("SELECT * FROM users ORDER BY id")
for row in cur:
    print(row)

conn.close()

Performance Tips¶

Use Connection Objects for Multiple Queries

# More efficient for multiple queries
conn = chdb.connect()
cur = conn.cursor()

for i in range(100):
    cur.execute(f"SELECT {i} as iteration")
    result = cur.fetchone()

conn.close()

Error Handling¶

Handle errors gracefully:

import chdb

try:
    result = chdb.query("SELECT invalid_column FROM non_existent_table")
except chdb.ChdbError as e:
    print(f"Query error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Next Steps¶

Now that you’re familiar with the basics:

Explore the Examples for more advanced use cases
Check out User Defined Functions (UDF) for custom functions
Learn about Session Management for stateful operations
Review the API Reference reference for complete functionality