dataflow#

class qtypy.dataflow(*, n_threads=-1, n_rows=-1, enable_mt=True)[source]#

qtypy layer for qty::dataflow.

Provides a high-level interface for defining, selecting, and querying columns from datasets, with optional multi-threaded execution.

Parameters:
  • multithread (bool, optional) – Enable multithreading (default is False).

  • n_threads (int, optional) – Number of threads to use for processing (default is -1, meaning all available threads).

  • n_rows (int, optional) – Number of rows to process from the dataset (default is -1, meaning all rows).

  • enable_mt (bool)

Variables:
  • dataset (object) – Dataset attached to the dataflow. Initially None.

  • columns (dict) – Mapping of column names to column definitions.

  • selections (dict) – Mapping of selection names to selection objects.

Examples

>>> df = dataflow()                 # use all available threads
>>> df = dataflow(n_threads=8)      # use (up to) 8 threads
>>> df = dataflow(enable_mt=False)  # single-threaded
compute(columns)[source]#

Define additional columns in the dataflow.

Parameters:
  • columns (dict) – A dictionary mapping column names (strings) to one of the following:

  • qtypy.dataset.column (-) – Existing quantity in dataset.

  • qtypy.column.constant (-) – Constant value of any C++ data type.

  • qtypy.column.expression (-) – JIT-compiled one-line C++ expression.

  • qtypy.column.definition (-) – Compiled C++ implementation of qty::column::definition<Ret(Args...)>.

Returns:

Enables method chaining.

Return type:

self