Behemoth: April 2013

Monday, April 29, 2013

Mac tools and tips

This list will surely grow:

TextWrangler
- Useful for comparing folders on mac via its "Search > Find Differences..." menu option.

Sunday, April 28, 2013

Generating recommendations via CouchDB - Part 1

I set out to build a solution in CouchDB with a simple usecase in mind:
Given a product being browsed, we want to provide suggestions based on what others have purchased in the past.
I've already discussed the strategies for breaking down a single sale/receipt/invoice in Generating recommendations via Elasticsearch - Part 1, so I decided to employ them for CouchDB as well. The difference being that instead of generating multiple documents for each sale:
(total # of lineitems) * (total # of lineitems-1)
I found it simpler to store the sale documents themselves and generate keys that represented product pairs, via the MAP function in CouchDB. If you're unfamiliar with the core concept of map/reduce, you can watch this five minute screencast: Understanding map reduce with CouchDB.
Here's what the results of the map operation look like:
Applying the REDUCE function yields rows that tell us which product was bought together with another and how many times. This result is similar to the facets created by Elasticsearch.
So what's lacking for a complete implementation?
1. We need a query that fetches results from this map/reduce view for the product being browsed by a consumer, for ex: T-shirt (demo)
2. Next, we need to discern which products have the highest count.
3. Next, we need to retrieve the product's ID in order to fetch more information about it ... for the users to view.
  - Where & how to store this information in the current map/reduce view ... so that we may fetch it as part of a query itself ... is an open question for now.
  - Maybe its something that cannot even be accomplished via CouchDB? Would it require an alternate implementation with secondary indexes such as the ones Cloudant provides?

Saturday, April 27, 2013

Why do most desktop blog editors suck?

First, what are they truly good for:
- You won't lose your content due to keyboard muscle-memory which differs between windows and mac users or because of hotkey mappings which often differ between browsers like Firefox and Safari. What am I talking about?
  - I'm talking about the tears I've shed when I tried to undo my last change via (ctrl+z or cmd+z) inside blogger/blogspot's web editor and it blew away ALL of my content! Was it safari, firefox, mac or windows conflicting key mappings that caused this? I don't know ... but its an evil concoction that I don't care for.
- Worst of all is the mapping for the backspace key. It move you back to the previous page in your browser when you are editing your blog instead of simply deleting a character ... thus causing you to lose all your work!
- Just having the comfort of knowing that your content won't be lost because of a dumb keystroke ... is just about the only thing that these desktop blog editors are good for.
Onwards & upwards ... lets look at all the reasons desktop blog editor SUCK ... which pretty much boils down to the fact that you still have to manually edit HTML because the simplest usecases aren't automatically handled. What are these usecases? Images and embeds!
- Is it really that hard for the desktop-blog-editor manufacturing industry (yes that is sarcasm) to grasp the fact that folks might want their images to float to the right or left and their content/text to simply flow around them? There's not a single tool that handles this via a WYSIWYG toolbar.
- Embeds mean different things to different people. For a developer, it means embedding code-blocks (gists for example) which will have their own space and your own content would flow around and not clash/overlap with them.
- Seriously, the desktop blog editors out there today (year 2013) are just a big disappointment :(

Friday, April 26, 2013

Generating recommendations via Elasticsearch - Part 1

Let's look at what it entails to roll your own recommendation engine with Elasticsearch.

A simple use case: Given a product being browsed, we want to provide suggestions based on what others have purchased in the past.

In order to accomplish this we will break down the steps required:

Index the past sale data, invoices or receipts in Elasticsearch.
Run a query that narrows down the results to a product that consumers are browsing and then returns suggestions for other items that are most frequently purchased together with it.

Here's an example of a sale but what's the correct structure for the JSON document which we should index into Elasticsearch, in order run the type of query which will yield the results for our use case?

One optimistic way to do this is to simply store the receipt itself as a JSON document:

But that just won't work because there isn't any query (known to me at least) that can make heads or tails of that one document alone and deliver aggregated results of items most frequently bought together.

We could generate one document to represent each item for which a query might come in and then add lineitems from various sales to it as a multi value field and this data could be what we facet upon to deliver aggregated results of items most frequently bought together.

Lets keep our focus on Apples and rest of the sibling lineitems will become recommendations for someone who comes looking for Apples:

Any additional sales will add to the number of elements in recorded lineitems:

But this doesn't work because the facet query will report that there is one Orange, one Lemon etc. It will not count Oranges as 2 but rather as 1 inside the lineitems field. This is because the facet query is meant to count the number of documents that have the term "Oranges" but not the number of times a term (such as Oranges) appears per document or field.

What next?

We can break apart the receipt into (total # of lineitems) * (total # of lineitems-1) documents as pairs. This way when a new sale appears, similar pairs can be identified and the facet query can accurately count them because they are all separate/individual documents.

Now we can see that the documents (one lineitem pair per document) from Sale1 and Sale2 can finally be used by a facet query to suggest that Apples are most often bought together with oranges and vice-versa.

Thursday, April 18, 2013

How to invoke update handler with Cradle?

Invoking an update handler via a Cradle nodejs client should be a simple task but I felt like I spent 10 minutes too many, piecing together documentation on it, since its not explicitly documented in the README.md file.

Based on the comments in the pull request where this functionality was added, here's a sample:

Behemoth