dataset::source#

class source : public queryosity::action#

Custom dataset source.

Subclassed by queryosity::dataset::reader< DS >

Public Functions

virtual void parallelize(unsigned int concurrency) = 0#

Inform the dataset of parallelism.

inline virtual void initialize()#

Initialize dataset processing.

virtual std::vector<std::pair<unsigned long long, unsigned long long>> partition() = 0#

Determine dataset partition for parallel processing.

A non-empty partition MUST begin from the 0 and be sorted contiguous order, e.g.:

{{0,100},{100,200}, ..., {900,1000}} 
If a dataset returns an empty partition, it relinquishes its control over the entry loop to another dataset with a non-empty partition.
Attention

  • Non-empty partitions reported from multiple datasets need to be aligned to form a common denominator partition over which the dataset processing is parallelized. As such, they MUST have (1) at minimum, the same total number of entries, and (2) ideally, shared sub-range boundaries.

  • Any dataset reporting an empty partition MUST be able to fulfill dataset::source::execute() calls for any entry number as requested by the other datasets loaded in the dataflow.

Returns:

Dataset partition

inline virtual void initialize(unsigned int slot, unsigned long long begin, unsigned long long end) override#

Enter an entry loop.

Parameters:
  • slot[in] Thread slot number.

  • begin[in] First entry number processed.

  • end[in] Loop stops after end-1-th entry has been processed.

inline virtual void execute(unsigned int slot, unsigned long long entry) override#

Process an entry.

Parameters:
  • slot[in] Thread slot number.

  • entry[in] Entry being processed.

inline virtual void finalize(unsigned int slot) override#

Exit an entry loop.

Parameters:
  • slot[in] Thread slot number.

  • entry[in] Entry being processed.

inline virtual void finalize()#

Finalize processing the dataset.