Setup

The following will guide you through standing up Fili in front of a Druid instance.

Also see Troubleshooting.md

Prerequisites
High Level Steps
Fili Wikipedia Example
Configure Metadata
Configuration Files
Scripts
Build and deploy the WAR
Dimension Loading

Prerequisites

Jetty
A working Druid cluster to serve as Fili’s backend.
(Optional: Dimension Caching) A source of truth for loaded dimensions. See dimension loading for more details.
(Optional: Dimension Caching) MDBM (or Redis) for storing dimension data, if the cardinalities of the dimensions are too high for an in-memory map.

High Level Steps

The following is a bird’s eye view of the steps you must take to stand up a Fili instance.

Clone the fili wikipedia example into a separate project.
Modify the dimension, metric, physical table and logical table information to fit the needs of your application.
Update the configuration files.
Build and deploy the war.
Setup the dimension loader.

Fili Wikipedia Example

The Fili wikipedia example is where you will leverage the Fili library. Here is where you will configure your application-specific metrics, dimensions, and tables.

Configure Metadata

The bulk of the work is in configuring Fili’s metadata:

Configuration Files

Next, several configuration files and scripts need to be tweaked:

In applicationConfig.properties the following properties need to be set:
- bard__resource_binder = binder.factory.class.path
- bard__dimension_backend = mdbm (redis if you wish to use Redis for your dimension metadata, memory if you wish to use an in-memory map)
  - (Optional: MDBM) bard__mdbm_location = dir/to/mdbm - Note that Fili assumes this directory contains a dimensionCache folder.
- bard__non_ui_broker = http://url/to/druid/broker
- bard__ui_broker = http://url/to/druid/broker
- bard__druid_coord = http://url/to/druid/coordinator
pom.xml - Find the fili.version tag, and update that to point to the desired version of Fili, rather than a snapshot.

Note that both bard__non_ui_broker and bard__ui_broker are set to the same broker URL. These parameters are artifacts of the project Fili was spun out of. Eventually, these two settings will be generalized into something useful for other projects. For now, you can safely treat them as if they were the same.

Build and Deploy the WAR

Now that the integration app has been properly configured, we need to build and deploy the WAR. Build the war by running mvn install on your application. You will then find a WAR file under the target directory. This WAR should be dropped into the webapp directory of your Jetty instance.

Dimension Loading

Dimensions in Fili fall into two categories: loaded and non-loaded. A Loaded dimension is one whose values (and associated metadata) have been loaded into Fili. A non-loaded dimension is one that has been configured, but whose values and metadata have not been loaded into Fili. Fili can filter on dimension metadata, and perform dimension joins only on loaded dimensions. However, you can query Druid using non-loaded dimensions. So Fili is quite useful even with non-loaded dimensions, but if you want to unlock its full power, you should ensure that all of your dimensions are loaded.

To load a dimension, you need to load its dimension rows into Fili by sending two POST requests to /v1/cache/dimensions/<myDimension>/dimensionRows. The first request updates the dimension values, the second loads the datetime at which the dimensions were successfully loaded. The second request is essentially used to mark that the dimension rows were successfully loaded.

We will look at the first request first. The payload for each dimension is an object. The object contains a list of objects called dimensionRows. Each object in the list contains the data for a single value of the dimension:

{ 
    "dimensionRows": [ 
        { 
            "dataField1": data, 
            "dataField2": data
        }, {
            "dataField1": data, 
            "dataField2": data
        },
        ...
    ]
}

A well-defined dimension has only two requirements: There must be a field that serves as a key field for each dimension value (some sort of id field), and a top-level field lastUpdated for the entire dimension that provides the date at which the values for the dimension were last updated.

The second request is a very simple JSON object:

{
    "lastUpdated": "Roughly current date in ISO 8601 format",
}

For example, suppose we have a dimension gender with three values: male, female, and unknown. The metadata consists of a field id and a field description. Then we might send the following payload to /v1/cache/dimensions/gender/dimensionRows:

{
    "dimensionRows": [
        {
            "id": "male",
             "description": "The visitor was of the male persuasion.",
        }, {
            "id": "female",
             "description": "The visitor was of the female persuasion."
        }, {
            "id": "unknown",
            "description": "We don't know the gender of the visitor. Oh woe is us."
        } 
    ]
}

Followed by:

{
    "lastUpdated": "2015-12-16T10:25:00"
}

Typically this is done by setting up a program that runs in the background, periodically grabs dimension metadata from the dimension source of truth, and pushes it into Fili.

Non-Loaded Dimensions

It may be the case that you don’t need dimension joins, or to filter on dimension metadata for one or more of your dimensions. You can make such a dimension a non-loaded dimension. A non-loaded dimension is configured as follows:

Configure the dimension to use the NoOpSearchProvider. See Configuring Dimensions for details on how to configure a dimension’s SearchProvider.
Send a JSON payload to /v1/cache/dimensions/dimensionName containing just an id for the dimension, and a lastUpdated field with some date following the ISO 8601 specification.

For example, suppose we want to make gender non-loaded. Then, after configuring the Gender dimension with the NoOpSearchProvider and starting Jetty, we would send the following payload to /v1/cache/dimensions/gender:

    {
        "name": "gender",
        "lastUpdated": "2015-12-16T00:00:00"
    }

You can reduce the complexity of setup by making all of your dimensions non-loaded. Therefore, if you are primarily interested in rapidly setting up a Fili instance, you may wish to make all of your dimensions non-loaded. You can load your dimensions later, once you have verified that Fili will meet your needs.