dataflow#
- class qtypy.dataflow(*, n_threads=-1, n_rows=-1, enable_mt=True)[source]#
qtypy layer for
qty::dataflow.Provides a high-level interface for defining, selecting, and querying columns from datasets, with optional multi-threaded execution.
- Parameters:
multithread (bool, optional) – Enable multithreading (default is False).
n_threads (int, optional) – Number of threads to use for processing (default is -1, meaning all available threads).
n_rows (int, optional) – Number of rows to process from the dataset (default is -1, meaning all rows).
enable_mt (bool)
- Variables:
dataset (object) – Dataset attached to the dataflow. Initially None.
columns (dict) – Mapping of column names to column definitions.
selections (dict) – Mapping of selection names to selection objects.
Examples
>>> df = dataflow() # use all available threads >>> df = dataflow(n_threads=8) # use (up to) 8 threads >>> df = dataflow(enable_mt=False) # single-threaded
- compute(columns)[source]#
Define additional columns in the dataflow.
- Parameters:
columns (dict) – A dictionary mapping column names (strings) to one of the following:
qtypy.dataset.column (-) – Existing quantity in dataset.
qtypy.column.constant (-) – Constant value of any C++ data type.
qtypy.column.expression (-) – JIT-compiled one-line C++ expression.
qtypy.column.definition (-) – Compiled C++ implementation of
qty::column::definition<Ret(Args...)>.
- Returns:
Enables method chaining.
- Return type:
self