LArSoft  v09_90_00
Liquid Argon Software toolkit - https://larsoft.org/
LArSoft data proxy infrastructure

Classes for implementation and customization of LArsoft proxies. More...

Modules

 Collection proxy infrastructure
 Infrastructure to define a proxy of collection data product.
 
 Proxy element infrastructure
 Infrastructure to describe the element of a proxy.
 
 Parallel data infrastructure
 Infrastructure for support of parallel data structures.
 
 Infrastructure for proxies as auxiliary
 Infrastructure to use a collection proxy as auxiliary data for another proxy.
 
 Associated data infrastructure
 Infrastructure for support of associated data.
 

Detailed Description

Classes for implementation and customization of LArsoft proxies.

This documentation section contains hints for the creation or customization of data product proxies. These are two slightly different use cases:

Some implementation choices are also explained here.

Bug:
The current design is flawed in the support of (sub)proxies as direct elements of a proxy. The merged elements are implemented as base classes of the proxies, to allow their potentially customized interface to percolate to the proxy. Since an indirect base class can't appear also as direct base class, trying to merge a proxy causes all sorts of conflicts between base classes. The stub code withCollectionProxyAs() and related is showing that problem on some (all?) usages. [the author hasn't tried to create a working combination for it]

The simplest new proxy

In its simplest form, a proxy may be created with no customization at all:

auto tracks = proxy::getCollection<std::vector<recob::Track>>
(event, tracksTag, proxy::withAssociated<recob::Hit>());

makes a proxy object called tracks which accesses a recob::Track collection data product and its associated hits, assuming that tracks and their association with hits be created by the same module (trackTag).

Note
Only associations falling under some restricted assumptions can be used in a proxy.

From this, it is possible to access tracks and their hits:

for (auto trackInfo: tracks) {
recob::Track const& track = *trackInfo; // access to the track
double const startTheta = track.Theta();
double const length = trackInfo->Length(); // access to track members
// access to associated data (returns random-access collection-like object)
auto const& hits = trackInfo.get<recob::Hit>();
double charge = 0.0;
for (auto const& hitPtr: hits) {
charge += hitPtr->Integral();
} // for hits
<< "[#" << trackInfo.index() << "] track ID=" << track.ID()
<< " (" << length << " cm, starting with theta=" << startTheta
<< " rad) deposited charge=" << charge
<< " with " << hits.size() << " hits";
} // for tracks

If instead of using trackInfo.get<recob::Hit>() you want to use something more friendly, like trackInfo.Hits(), then you need to create and customize a new proxy class.

Quirks of proxy usage (a.k.a. "C++ is not python")

There are a number of things one should remember when using proxies.

First, the type of the proxy collection, and the type of the proxy collection element, are not trivial. That is the reason why we use

auto tracks = proxy::getCollection<std::vector<recob::Track>>
(event, tracksTag, proxy::withAssociated<recob::Hit>());

instead of

std::vector<recob::Track>,
>
tracks = proxy::getCollection<std::vector<recob::Track>>
(event, tracksTag, proxy::withAssociated<recob::Hit>());

and even more so

for (auto trackInfo: tracks)

instead of the full class name

More important, the type depends on which elements we merged into the collection proxy (in the example, proxy::details::AssociatedData reveals that we merged an associated data). This means that a C++ function in general can't reliably take a proxy argument by specifying its type, and it needs to use templated arguments instead:

template <typename Track>
unsigned int nHitsOnView(Track const& track, geo::View view);

Also remember that template class methods are allowed but they can't be virtual.

Second quirk, which yields a confused compilation message (at least with GCC 6), is that template class methods of objects of a template type need the template keyword for C++ to understand what's going on:

template <typename Track>
unsigned int nHitsOnView(Track const& track, geo::View view) {
unsigned int count = 0U;
for (art::Ptr<recob::Hit> const& hitPtr: track.template get<recob::Hit>())
if (hitPtr->View() == view) ++count;
return count;
} // nHitsOnView()

Here, track is a constant reference to type Track, which is a template type, so that when we ask for track.get<recob::Hit>() the compiler does not know that the object track of type Track has a method get() which is a template method, and it gets confused (in fact, it may think the expression might be a comparison, track.get < recob::Hit, and hilarity ensues). This is not true when the type of the object is immediately known:

auto tracks = proxy::getCollection<std::vector<recob::Track>>
(event, tracksTag, proxy::withAssociated<recob::Hit>());
for (auto track: tracks) {
unsigned int count = 0U;
for (art::Ptr<recob::Hit> const& hitPtr: track.get<recob::Hit>())
if (hitPtr->View() == view) ++count;
mf::LogVerbatim("") << "Track ID=" << track->ID() << ": " << count
<< " hits on view " << view;
} // for

where tracks is a well-known (to the compiler) type, and track as well.

Customization of collection proxies

The "customization" of a collection proxy consists of writing classes and functions specific for a use case, to be used as components of a collection proxy in place of the standard ones.

The options of customization are numerous, and it is recommended that customization start from the code of an existing customized proxy implementing functionalities similar to the desired ones. In the same spirit, customization hints are not provided here, but rather in the documentation of the proxy::Tracks collection proxy.

Technical details

Overhead

The proxies have been developed with an eye on minimising the replication of information. The proxies are therefore light-weight objects relying on pointers to the original data. One exception is that an additional structure is created for each one-to-many association (i.e., to hits), which includes a number of entries proportional to the number of tracks.

In general, anyway, copy of any proxies is not recommended, as it is usually better just to pass around a reference to them.

Since this interface (and implementation) is still in development, there might be flaws that make it non-performant. Please report any suspicious behaviour.

Interface substitution

A technique that is used in this implementation is to replace (or extend) the interface of an existing object. The documentation of file CollectionView.h includes a more in-depth description of it.

Iterator wrappers and "static polymorphism"

A widely used interface change is the substitution of the dereference operator of an iterator:

struct address_iterator: public iterator { // DON'T DO THIS (won't work)
auto operator*() -> decltype(auto)
{ return std::addressof(iterator::operator*()); }
}; // my_iterator

There are two important pitfalls to be aware of in this specific case, well illustrated in this example.

If the caller tries to use e.g. ait->name() on a address_iterator ait (or other members, like ait[0]), they will be picked from the base class, and the overloaded operator*() is ignored. This can be avoided with private inheritance, forcing the explicit implementation of everything we want to use, which will be at very least an increment operator and a comparison one.

The second pitfall is that the base class methods return base class references. For example, *ait++ will call the inherited increment operator, which returns an object of type iterator, and the following dereference will be called on it, again bypassing the overridden dereference method. This means that to implement the increment operator is not enough to import the inherited one (using iterator::operator++;).

This task of wrapping a base_iterator involves a lot of "boilerplate" code: the prefix increment operator will always be auto& operator++() { base_iterator::operator++(); return *this; }, the indexing operator will always be auto operator[](std::size_t i) -> decltype(auto) { return std::addressof(base_iterator::operator[](i)); } etc. The usual solution is to derive the iterator class from one that implements the boilerplate. Unfortunately part of the boilerplate is from the derived class and so it can't appear in the base class. With run-time polymorphism, the base iterator might define an abstract value transformation method (transform()) and use it in its other methods; the linker will take care later on of plugging the right transform() method from the derived class. To obtain the same effect at compile time, the base class needs to know in advance the transform() function. Plugging it as a templated literal argument (a function pointer) requires quite some gymnastic in predicting the right data type, especially the return type. A weird alternative is to have this base class inherit from the derived class, specified as template argument. The derived iterator looks like:

struct address_iterator: public iterator_base<address_iterator, iterator> {
using iterator_base_t = iterator_base<address_iterator>;
using iterator_base_t::iterator_base_t;
static auto transform(iterator const& it) { return std::addressof(*it); }
};

and the weirdness is concentrated in the iterator_base:

template <typename FinalIter, typename WrappedIter>
class iterator_base: private WrappedIter {
WrappedIter& asWrapped() const
{ return static_const<WrappedIter&>(*this); }
FinalIter& asFinal() { return static_const<FinalIter&>(*this); }
public:
iterator_base() = default;
iterator_base(WrappedIter const& from): WrapperIter(from) {}
FinalIter& operator++() { WrappedIter::operator++(); return asFinal(); }
auto operator*() const -> decltype(auto)
{ return asFinal().transform(*asWrapped()); }
bool operator!= (iterator_base const& other) const
{ return asWrapped() != other.asWrapped(); }
}; // class iterator_base

With this class, it's possible to transform an iterator into an address_iterator, in a similar way to how described in the "Interface substitution" section (there are some workaround needed because of private inheritance and to ensure that the iterator traits are correct).

Note
I learned afterward about the existence of boost::iterator_adapter, which might provide similar functionality and also be dealing correctly with non-constant iterators. Worth considering.