Fili metrics are either named aggregations of Druid metrics or named expressions over other Fili metrics. They range from simple arithmetic to complex combinations of aggregations and post-aggregations.
There are two types of metrics:
First order metrics are metrics that directly aggregate a Druid metric. For example, you might have two metrics,
page_views
and additive_page_views
, which compute the longSums
of their equivalent Druid
metrics, druid_page_views
and druid_additive_page_views
.
Higher order metrics are metrics defined in terms of other metrics. For example, you might have a
total_page_views
metric that is the sum of page_views
and additive_page_views
.
Fili relies on a MetricDictionary
to resolve names into metrics. This suggests there are
two pieces you need to define:
The names of your metrics
The metrics themselves
You can name your metrics by implementing the ApiMetricName
interface. The interface has two
responsibilities:
It provides a formal name to the metrics that can be used by other parts of the system (like the
BaseTableLoader
).
It determines if the metric is valid for a given TimeGrain
.
For example, consider the following enum:
public enum ExampleApiMetricName implements ApiMetricName {
PAGE_VIEWS,
ADDITIVE_PAGE_VIEWS,
TOTAL_PAGE_VIEWS;
private final TimeGrain minimumGrain;
ExampleApiMetricName() {
this.minimumGrain = DefaultTimeGrain.DAY;
}
@Override
public String getApiName() {
return EnumUtils.enumJsonName(this);
}
@Override
public boolean isValidFor(TimeGrain grain) {
//Check if the passed in grain is coarser than the metric's grain.
return grain.compareTo(minimumGrain) >= 0;
}
...
}
This enum specifies that all metrics are valid for time grains at the day level and coarser (week, month, year, etc).
The WikiApiMetricName
in the Fili-wikipedia-example provides a
more complete example.
You also need to give Fili the names of the Druid metrics. This is done by implementing the FieldName
interface in a similar manner as ApiMetricName
(except Druid metric names do not require a minimum time grain).
Implementing FieldName
allows you to feed the Druid metric names into the BaseTableLoader
, which
uses them to configure the physical tables. See the Binding Resources
for more information about
loading tables. The WikiDruidMetricName
enum provides an example.
Next, you need to write the code that builds the metrics and loads them into the MetricDictionary
at Fili start up. To
do so, you need to implement the MetricLoader
interface, which has a single method
loadMetricDictionary
.
For example, suppose you want to register the three page view metrics introduced in Overview.
Then the loadMetricDictionary
method may look something like this:
private MetricMaker longSumMaker;
private MetricMaker sumMaker;
@Override
public void loadMetricDictionary(MetricDictionary metricDictionary) {
buildMetricMakers(metricDictionary);
metricInstances = buildMetricInstances(metricDictionary);
addToMetricDictionary(metricDictionary, metricInstances);
}
A MetricMaker
knows how to construct a LogicalMetric
. A LogicalMetric
is a named
Druid query plus a Mapper
for post-Druid processing. For example, the longSumMaker
knows
how to construct a longSum
aggregation, while the sumMaker
knows how to
construct an arithmetic post aggregation using addition.
For the running example, a longSumMaker
and a sumMaker
are needed:
private void buildMetricMakers(MetricDictionary metricDictionary) {
longSumMaker = new LongSumMaker(metricDictionary);
sumMaker = new ArithmeticMaker(metricDictionary, ArithmeticPostAggregationFunction.PLUS);
}
A MetricInstance knows how to use a MetricMaker
to make a metric. In the running example, there
are three metrics: page_views
, additive_page_views
, and total_page_views
. A MetricInstance
is needed for
each metric:
private List<MetricInstance> buildMetricInstances(MetricDictionary metricDictionary) {
return Arrays.<MetricInstance>asList(
new MetricInstance(PAGE_VIEWS, longSumMaker, DRUID_PAGE_VIEWS),
new MetricInstance(ADDITIVE_PAGE_VIEWS, longSumMaker, DRUID_ADDITIVE_PAGE_VIEWS),
new MetricInstance(TOTAL_PAGE_VIEWS, sumMaker, ADDITIVE_PAGE_VIEWS, PAGE_VIEWS)
);
}
Observe that it is here that you tie metrics to their dependents. Since page_views
and additive_page_views
are
both Druid metrics, they rely on the respective druid metrics. Meanwhile, total_page_views
relies on
additive_page_views
and page_views
.
Finally, the metrics need to be made, and added to the MetricDictionary
. In the example, this is handled by
the addToMetricDictionary
method:
private void addToMetricDictionary(MetricDictionary metricDictionary, List<MetricInstance> metrics) {
metrics.stream().map(MetricInstance::make).forEach(metricDictionary::add);
}
The Fili wikipedia example has a sample metric loader called
WikiMetricLoader
.
Of course, Fili also needs to be told about the MetricLoader
that you just defined. See
Binding Resources for details on how to do that.
Most custom metrics will be simple operations on metrics that already exist, using makers that already exist. In this
case, defining the new metric is as simple as adding the following line to your
buildMetricInstances
method (or equivalent):
new MetricInstance(NEW_METRIC_NAME, metricMaker, DEPENDENT, METRIC, NAMES)
and adding NEW_METRIC_NAME
to your implementation of ApiMetricName.
See Built-in Metrics for a list of makers that come with Fili.
Sometimes you need more than what Fili provides out of the box. Perhaps you need to perform a calculation that cannot be
expressed in terms of other metrics, or you are working with a datatype that Druid does not support natively. In such
cases, you can define your own custom maker. As a running example consider the ArithmeticMaker
,
which models post-aggregation arithmetic.
First, you need to decide what kind of metric you want to define: first-order or higher-order.
If the metric is first-order, then you should extend RawAggregationMetricMaker
. You will
also likely have to add a custom Druid aggregation to your Druid
cluster.
If the metric is higher-order, then you should extend MetricMaker
.
ArithmeticMaker
is a higher-order metric, so it extends MetricMaker
.
The bulk of the work in defining a custom Maker is in overriding the makeInner
method, which performs the actual
construction of the LogicalMetric
:
@Override
protected LogicalMetric makeInner(String metricName, List<String> dependentMetrics) {
...
}
makeInner
generally performs the following steps:
Merge Dependent Queries: If there is more than one dependent metric, merge the queries of each dependent metric
into a single query. This can be accomplished using the MetricMaker::getMergedQuery
method.
Since ArithmeticMaker
takes at least two other metrics, its dependent metrics need to be merged:
TemplateDruidQuery mergedQuery = getMergedQuery(dependentMetrics);
A TemplateDruidQuery
is scaffolding of a Druid query that knows how to merge with another
TemplateDruidQuery
.
Build Aggregators and Post-Aggregators: Construct the aggregations and post-aggregations the query depends on.
In the case of ArithmeticMaker
, the query consists of the aggregations performed by its
dependent metrics, a field accessor for each aggregation, and a single
arithmetic post-aggregation.
Set<Aggregation> aggregations = mergedQuery.getAggregations();
//Creates a field-accessor post-aggregation for the aggregator in each dependentMetric.
List<PostAggregation> operands = dependentMetrics.stream()
.map(this::getNumericField)
.collect(Collectors.toList());
PostAggregation arithmeticPostAgg = new ArithmeticPostAggregation(metricName, function, operands);
Build the inner query: Construct the inner query, if the metric requires query nesting.
The ArithmeticMaker
uses the inner query constructed by the getMergedQuery
method. See
AggregationAverageMaker
for an example maker that builds a more interesting inner
query.
TemplateDruidQuery innerQuery = mergedQuery.getInnerQuery();
Build TemplateDruidQuery: Construct a TemplateDruidQuery
.
ArithmeticMaker
constructs the following TemplateDruidQuery
:
TemplateDruidQuery templateDruidQuery = new TemplateDruidQuery(
aggregations,
Collections.singletonSet(arithmeticPostAgg),
innerQuery,
mergedQuery.getTimeGrain()
);
Build Mapper: Construct a Mapper. If a metric does not require post-Druid processing, then an
instance of NoOpResultSetMapper
should be used.
The ArithmeticMaker
uses a ColumnMapper
that is injected at construction time as
resultSetMapper
. So all that needs to be done here is construct a new version of resultSetMapper
with the name
of the metric being constructed:
ColumnMapper mapper = resultSetMapper.withColumnName(metricName);
Build LogicalMetric: Construct and return the LogicalMetric
.
return new LogicalMetric(query, mapper, metricName);
Mappers are subclasses of ResultSetMapper
that allow us to perform post-Druid processing in a
row-wise fashion. Fili constructs the post-Druid workflow by iterating through each LogicalMetric
and composing their
Mappers into a function chain. When the Druid result comes in, the result set is then passed through each link in the
chain in the order of the metrics defined in the query.
To define a Mapper
, you need to override two methods: map(Result result, Schema schema)
and map(Schema schema)
.
The first allows you to modify a single row in the result set. The second allows you to modify the result schema.
In order to allow Result processing in a (moderately) type-safe way, the Result
class provides a variety of
methods for extracting the value of a metric column of the appropriate type:
getMetricValue
getMetricValueAsNumber
getMetricValueAsString
getMetricValueAsBoolean
getMetricValueAsJsonNode
The first returns the metric value as an Object
. The others cast the result to the appropriate type (BigDecimal
in
the case of getMetricValueAsNumber
).
NonNumericMetrics
contains simple sample mappers for each of the non-numeric metrics.
SketchRoundUpMapper
is an example of a mapper for numeric metrics.
RowNumMapper
is an example of a mapper that adds a column.
Complex (non-numeric) metrics are configured the same as custom numeric metrics. Fili supports all native JSON types:
Numbers, Strings, and Booleans are parsed into the corresponding Java types. JSON Objects and Lists are extracted from
the Druid response as JsonNode
instances. By default, Fili will pass the results from Druid on to the user
unchanged. If post-Druid processing is required, a Mapper can be added to the mapper workflow stage. See
Custom Metrics for details on how to add a Mapper to the workflow.
If Druid returns a JSON null
, then Fili will parse it into the Java null
.