Behemoth: Generating recommendations via Elasticsearch

Let's look at what it entails to roll your own recommendation engine with Elasticsearch.

A simple use case: Given a product being browsed, we want to provide suggestions based on what others have purchased in the past.

In order to accomplish this we will break down the steps required:

Index the past sale data, invoices or receipts in Elasticsearch.
Run a query that narrows down the results to a product that consumers are browsing and then returns suggestions for other items that are most frequently purchased together with it.

Here's an example of a sale but what's the correct structure for the JSON document which we should index into Elasticsearch, in order run the type of query which will yield the results for our use case?

One optimistic way to do this is to simply store the receipt itself as a JSON document:

But that just won't work because there isn't any query (known to me at least) that can make heads or tails of that one document alone and deliver aggregated results of items most frequently bought together.

What next?

We can break apart the receipt into (total # of lineitems) * (total # of lineitems-1) documents as pairs. This way when a new sale appears, similar pairs can be identified and the facet query can accurately count them because they are all separate/individual documents.

Now we can see that the documents (one lineitem pair per document) from Sale1 and Sale2 can finally be used by a facet query to suggest that Apples are most often bought together with oranges and vice-versa.

Behemoth