Basic bits#
from qtypy import dataflow
df = dataflow(multithreaded = True)
Computing columns#
A column can be computed in 4 ways:
dataset.columnwhose value is read in from an existing dataset.column.constantwhose value does not change throughout per-event.column.expression, or simply astr, that is jitted using Cling.column.definition, a user-compiled C++ computation unit (see API reference for more details).
from qtypy import dataset, column
df["x"] = dataset.column(key="x", dtype="ROOT::RVec<float>")
df["y"] = "x.size()"
df["invalid C++ identifier"] = column.constant(1) # ❌
Applying selections#
By default, successive selection operations are always compounded one after another:
(
df
.filter({"all" : "true"}) # cut = true, weight = 1.0
.filter({"a" : "a == true"}) # cut = (all && a), weight = 1.0
.weight({"w" : "w == true"}) # cut = (all && a), weight = (1.0 * w)
)
Branching & merging#
Diverging selections can be applied by specifying they be applied “at” a branching point:
a = df & {"a" : "a == true"}
a_and_b = df @ "a" & {"a_and_b" : "b == true"}
a_and_b = df @ "a" & {"a_and_b" : "c == true"}
Divergent selections can be subsequently re-merged similarly:
intersection = (df @ "a") & {"intersection" : "a_and_b && a_and_c"}
union = (df @ "a") & {"union" : "a_and_b || a_and_c"}
Histogram outputs#
from qtypy import hist
# from whatever the last applied selection was
hx = df >> hist("x", dtype="float", nx=100, xmin=0.0, xmax=1.0).fill("x")
df @ "base" >> hist("x", dtype="float", nx=100, xmin=0.0, xmax=1.0).fill("x")
Remember that data processing is triggered only once the result of a query is explicitly accessed:
hx.result() # TH1F
Multiple queries can be performed at once (of course). Query definitions can be “recycled” to be run at multiple selections, for example:
# multiple selections
selections = ["a", "b", "c"]
# define query without df
hx = hist("x", dtype="float", nx=100, xmin=0.0, xmax=1.0).fill("x")
# query at multiple selections
hxs = {
sel : df @ sel >> hx for sel in selections
}
Important
Accessing the result of a query triggers the event-loop (if needed). In order to prevent running it multiple times, make sure that you have specified all queries needed at a time before accessing any of their results!
Numpy array outputs#
x = df >> column.to_numpy("x", dtype="float")
x.result() # <class "numpy.ndarray">, dtype("float32")
Note
numpy.asarray() supports zero-copy readout of the std::vector only in the case of scalar elements.
For nested/arbitrary data types, a copy must be invoked when crossing the C++-to-Python boundary.