The following will guide you through standing up Fili in front of a Druid instance.
A working Druid cluster to serve as Fili’s backend.
(Optional: Dimension Caching) A source of truth for loaded dimensions. See dimension loading for more details.
(Optional: Dimension Caching) MDBM (or Redis) for storing dimension data, if the cardinalities of the dimensions are too high for an in-memory map.
The following is a bird’s eye view of the steps you must take to stand up a Fili instance.
The Fili wikipedia example is where you will leverage the Fili library. Here is where you will configure your application-specific metrics, dimensions, and tables.
The bulk of the work is in configuring Fili’s metadata:
Next, several configuration files and scripts need to be tweaked:
bard__resource_binder = binder.factory.class.path
bard__dimension_backend = mdbm
(redis
if you wish to use Redis for your dimension metadata, memory
if
you wish to use an in-memory map)
bard__mdbm_location = dir/to/mdbm
- Note that Fili assumes this directory contains a
dimensionCache
folder.bard__non_ui_broker = http://url/to/druid/broker
bard__ui_broker = http://url/to/druid/broker
bard__druid_coord = http://url/to/druid/coordinator
fili.version
tag, and update that to point to the desired version of Fili, rather
than a snapshot.Note that both bard__non_ui_broker
and bard__ui_broker
are set to the same broker URL. These parameters are
artifacts of the project Fili was spun out of. Eventually, these two settings will be generalized into something useful
for other projects. For now, you can safely treat them as if they were the same.
Now that the integration app has been properly configured, we need to build and deploy the WAR. Build the war by
running mvn install
on your application. You will then find a WAR file under the target
directory. This WAR should
be dropped into the webapp directory of your Jetty instance.
Dimensions in Fili fall into two categories: loaded and non-loaded. A Loaded dimension is one whose values (and associated metadata) have been loaded into Fili. A non-loaded dimension is one that has been configured, but whose values and metadata have not been loaded into Fili. Fili can filter on dimension metadata, and perform dimension joins only on loaded dimensions. However, you can query Druid using non-loaded dimensions. So Fili is quite useful even with non-loaded dimensions, but if you want to unlock its full power, you should ensure that all of your dimensions are loaded.
To load a dimension, you need to load its dimension rows into Fili by sending two POST requests
to /v1/cache/dimensions/<myDimension>/dimensionRows
. The first request updates the dimension values, the second loads
the datetime at which the dimensions were successfully loaded. The second request is essentially used to mark that the
dimension rows were successfully loaded.
We will look at the first request first. The payload for each dimension is an object. The object contains
a list of objects called dimensionRows
. Each object in the list contains the data for a single value of the dimension:
{
"dimensionRows": [
{
"dataField1": data,
"dataField2": data
}, {
"dataField1": data,
"dataField2": data
},
...
]
}
A well-defined dimension has only two requirements: There must be a
field that serves as a key field for each dimension value (some sort of id
field), and a top-level field lastUpdated
for the entire dimension that provides the date at which the values for the dimension were last updated.
The second request is a very simple JSON object:
{
"lastUpdated": "Roughly current date in ISO 8601 format",
}
For example, suppose we have a dimension gender
with three values: male
, female
, and unknown
. The metadata
consists of a field id
and a field description
. Then we might send the following payload to
/v1/cache/dimensions/gender/dimensionRows
:
{
"dimensionRows": [
{
"id": "male",
"description": "The visitor was of the male persuasion.",
}, {
"id": "female",
"description": "The visitor was of the female persuasion."
}, {
"id": "unknown",
"description": "We don't know the gender of the visitor. Oh woe is us."
}
]
}
Followed by:
{
"lastUpdated": "2015-12-16T10:25:00"
}
Typically this is done by setting up a program that runs in the background, periodically grabs dimension metadata from the dimension source of truth, and pushes it into Fili.
It may be the case that you don’t need dimension joins, or to filter on dimension metadata for one or more of your dimensions. You can make such a dimension a non-loaded dimension. A non-loaded dimension is configured as follows:
Configure the dimension to use the NoOpSearchProvider. See
Configuring Dimensions for details on how to configure a dimension’s
SearchProvider
.
Send a JSON payload to /v1/cache/dimensions/dimensionName
containing just an id
for the dimension, and a
lastUpdated
field with some date following the ISO 8601 specification.
For example, suppose we want to make gender non-loaded. Then, after configuring the Gender
dimension with the
NoOpSearchProvider
and starting Jetty, we would send the following payload to /v1/cache/dimensions/gender
:
{
"name": "gender",
"lastUpdated": "2015-12-16T00:00:00"
}
You can reduce the complexity of setup by making all of your dimensions non-loaded. Therefore, if you are primarily interested in rapidly setting up a Fili instance, you may wish to make all of your dimensions non-loaded. You can load your dimensions later, once you have verified that Fili will meet your needs.