Conceptual overview#
Queryosity enables a dataflow model of data analysis, in which a data (edges) flow between actions (nodes) in to form a directed, acyclic graph.

An action belongs to one of three categories, depending on the nature of the operation. A sub-graph of tasks within each category expresses the dependencies of each action on others, potentially of other categories, as inputs.
Action |
Description |
Methods |
Description |
Task Graph |
Inputs |
|---|---|---|---|---|---|
|
Quantity of interest |
|
Read a column. |
Computation |
|
|
Compute a column. |
|
|||
|
Boolean decision |
|
Apply a cut. |
Cutflow |
|
Floating-point decision |
|
Apply a statistical significance. |
|
||
|
Perform a query |
|
Define an output. |
Experiment |
|
|
Populate with column value(s). |
|
|||
|
Run over selected entries. |
|
|||
|
Get the result. |
All actions are first defined in a lazy fashion, meaning they are not performed unless its result is accessed by the user. The eagerness of actions in each entry is as follows:
A query is performed only if its associated selection passes the cut.
A selection is applied only if all prior cuts in the cutflow have passed.
A column is evaluated only if it is needed to determine any of the above.
Columns#
- Column
An action that holds a value of some data type
Tto be updated for each entry.- Independent column
A column whose value does not depend on others
- Dependent column
A column whose value is evaluated out of those from other columns as inputs.
The tower of dependent columns can be constructed to form the computation graph:

Only the minimum number of computations needed are performed for each entry:
A column value is computed once for an entry (if needed), then cached and never re-computed.
A column value is not copied when used as an input for dependent columns (unless a conversion is needed).
Selections#
- Selection
A scalar-valued column corresponding to a “decision” on an entry:
A boolean
cutto determine if a query should be performed for the entry.A series of two or more cuts becomes their intersection,
and
A floating-point
weightto assign a statistical significance to the entry.A series of two or more weights becomes to their product,
*.
A cutflow can contain the following types of connections between selections:
Applying a selection from an existing node, which determines the order in which they are compounded.
Branching selections by applying more than one selection from a common node.
Merging two selections, e.g. taking the union/intersection of two cuts.

Selections constitute a specific type of columns, so they are subject to the lazy-evaluation and value-caching behaviour of the computation graph. Addditionally, the cutflow imposes the following rules:
The cut at a selection is evaluated only if all previous cuts have passed.
The weight at a selection is evaluated only if its cut has passed.
Queries#
- Query
An action that outputs result of some data type
Tafter traversing the dataset.It must be associated with a selection whose cut determines which entries to count.
(Optional) The result is populated with the weight taken into account.
How the query counts an entry is a user-implemented arbitrary action.
(Optional) The result is populated based on values of inputs columns.