Behemoth: 2013

Sunday, December 29, 2013

Dangers of mixing artifact versions across maven modules

Sometimes when a quickfix is urgently required, we are tempted to branch off and make it happen on our own until the authors of the original branch release their latest version. I found myself in such a situation and decided to only use the patched version for a small~ish maven module/artifact known in my project as "commons". What could go wrong? Well I ran into the following error at runtime:

java.lang.NoSuchMethodError: org.springframework.amqp.core.AmqpAdmin.declareQueue(Lorg/springframework/amqp/core/Queue;)Ljava/lang/String;

Turns out that my "commons" module was expecting to use a slightly different method signature than what was available in my main project:

// method signature from spring-amqp-1.3.0.BUILD-SNAPSHOT
String declareQueue(Queue queue);

// method signature from spring-amqp-1.2.0M1
void declareQueue(Queue queue);

So ... lesson learned :) push the authors to release quicker ;)

Saturday, December 28, 2013

Maven plugins & extensions for working with WSDLs, XSDs and POJOs

My favorite plugin for generating a webservice from a WSDL file (WSDL-first-approach) is Apache CXF because it loops in Spring, JAX-WS and other standards and implementations to make it happen.

Before CXF became by BFF I had gone looking for a maven archetype/template that would bring everything together for me in Eclipse. I did some templates but none that used the WSDL-first-approach to generate a sample project for me. So I gave up on looking for archetypes and instead followed a mix of instructions from the following links:

Anyway, CXF is awesome for the following reasons:

It is easy to configure cxf-codegen-plugin for generating a Java webservice code from a WSDL file.
It is easy to configure cxf-xjc-plugin for generating POJO (plain-old-java-objects) classes from a XSD (schema) file.
Other maven extensions can be used to enhance the cxf-xjc-plugin's schema to POJO conversion process:

Lets say you want to use something like a builder-pattern which makes your written code more beautiful. You can do that by configuring the jaxb-fluent-api extension!
If some of your POJOs share the same methods then not being able to categorize all those POJOs under a common interface can lead to duplicate code. For example, lets say you have a method named: performCommonTaskOnIncomingObject(ClassName argument) ... now you may need to have multiple such method with different names because you'll need to change the argument's classname everytime to match the object you're passing in!

So can we get the code-generator to implement interfaces on POJOs for us without messing with the schema? Yes!
Just configure the jaxb2-basics's inheritance-extension as you see fit.

Saturday, August 10, 2013

Setup continuous integration (CI) with Jenkins / Hudson for TestFlight

Steps taken between Nov 12, 2012 and Mar 7, 2013 to finally get it going:

Started with a tutorial.
Then went here to download Jenkins for mac.
I tried 5 builds so far via Jenkins, all resulted in failure, solved a few configuration issues to resolve problems, and now I'm facing an issue that has to do with keychains. Tried to follow this thread to figure out the appropriate action.
Decided to uninstall Jenkins using these instructions
Instead used a new installer that circumvents the keychain access issues.
I let it checkout my project into its workspace once at:
/Users/pulkitsinghal/.jenkins/jobs/my_project/workspace
But afterwards I deleted that workspace and soft-linked it with the actual directory which I already have setup for development:
cd /Users/pulkitsinghal/.jenkins/jobs/my_project
ln -s ~/dev/my_project/ ./workspace
And I completely turned off the git checkout process in the Jenkin's job configuration just to be safe ... although I should have been able to get away by simply ignoring submodules too but why checkout when I don't need to.
Then I bumped my head on this:
fatal error: 'RestKit/RestKit.h' file not found
#import
^
1 error generated.
It should have already been working based on the directions from the RestKit website: Add the following Header Search Paths (including the quotes):
"$(BUILT_PRODUCTS_DIR)/../../Headers"
But I suppose for some reason that meant one thing to Xcode and something entirely different to jenkins-xcode-plugin. So through trial and error and watching the logs, I figured out that I needed to add:
"$(BUILT_PRODUCTS_DIR)/../my_project/Build/Headers
in order for jenkins-xcode-plugin to pick it up and work with it properly.
Ran into another road-block ... posted question to stackoverflow.
Then got stuck due to this issue.
- Agreed to pay $5 bucks to fast-forward the resolution of this bug on Jenkins.
- While waiting ... I simply removed the following because the jenkins xcode plugin could not process it:

Monday, April 29, 2013

Mac tools and tips

This list will surely grow:

TextWrangler
- Useful for comparing folders on mac via its "Search > Find Differences..." menu option.

Sunday, April 28, 2013

Generating recommendations via CouchDB - Part 1

I set out to build a solution in CouchDB with a simple usecase in mind:
Given a product being browsed, we want to provide suggestions based on what others have purchased in the past.
I've already discussed the strategies for breaking down a single sale/receipt/invoice in Generating recommendations via Elasticsearch - Part 1, so I decided to employ them for CouchDB as well. The difference being that instead of generating multiple documents for each sale:
(total # of lineitems) * (total # of lineitems-1)
I found it simpler to store the sale documents themselves and generate keys that represented product pairs, via the MAP function in CouchDB. If you're unfamiliar with the core concept of map/reduce, you can watch this five minute screencast: Understanding map reduce with CouchDB.
Here's what the results of the map operation look like:
Applying the REDUCE function yields rows that tell us which product was bought together with another and how many times. This result is similar to the facets created by Elasticsearch.
So what's lacking for a complete implementation?
1. We need a query that fetches results from this map/reduce view for the product being browsed by a consumer, for ex: T-shirt (demo)
2. Next, we need to discern which products have the highest count.
3. Next, we need to retrieve the product's ID in order to fetch more information about it ... for the users to view.
  - Where & how to store this information in the current map/reduce view ... so that we may fetch it as part of a query itself ... is an open question for now.
  - Maybe its something that cannot even be accomplished via CouchDB? Would it require an alternate implementation with secondary indexes such as the ones Cloudant provides?

Saturday, April 27, 2013

Why do most desktop blog editors suck?

First, what are they truly good for:
- You won't lose your content due to keyboard muscle-memory which differs between windows and mac users or because of hotkey mappings which often differ between browsers like Firefox and Safari. What am I talking about?
  - I'm talking about the tears I've shed when I tried to undo my last change via (ctrl+z or cmd+z) inside blogger/blogspot's web editor and it blew away ALL of my content! Was it safari, firefox, mac or windows conflicting key mappings that caused this? I don't know ... but its an evil concoction that I don't care for.
- Worst of all is the mapping for the backspace key. It move you back to the previous page in your browser when you are editing your blog instead of simply deleting a character ... thus causing you to lose all your work!
- Just having the comfort of knowing that your content won't be lost because of a dumb keystroke ... is just about the only thing that these desktop blog editors are good for.
Onwards & upwards ... lets look at all the reasons desktop blog editor SUCK ... which pretty much boils down to the fact that you still have to manually edit HTML because the simplest usecases aren't automatically handled. What are these usecases? Images and embeds!
- Is it really that hard for the desktop-blog-editor manufacturing industry (yes that is sarcasm) to grasp the fact that folks might want their images to float to the right or left and their content/text to simply flow around them? There's not a single tool that handles this via a WYSIWYG toolbar.
- Embeds mean different things to different people. For a developer, it means embedding code-blocks (gists for example) which will have their own space and your own content would flow around and not clash/overlap with them.
- Seriously, the desktop blog editors out there today (year 2013) are just a big disappointment :(

Friday, April 26, 2013

Generating recommendations via Elasticsearch - Part 1

Let's look at what it entails to roll your own recommendation engine with Elasticsearch.

A simple use case: Given a product being browsed, we want to provide suggestions based on what others have purchased in the past.

In order to accomplish this we will break down the steps required:

Index the past sale data, invoices or receipts in Elasticsearch.
Run a query that narrows down the results to a product that consumers are browsing and then returns suggestions for other items that are most frequently purchased together with it.

Here's an example of a sale but what's the correct structure for the JSON document which we should index into Elasticsearch, in order run the type of query which will yield the results for our use case?

One optimistic way to do this is to simply store the receipt itself as a JSON document:

But that just won't work because there isn't any query (known to me at least) that can make heads or tails of that one document alone and deliver aggregated results of items most frequently bought together.

We could generate one document to represent each item for which a query might come in and then add lineitems from various sales to it as a multi value field and this data could be what we facet upon to deliver aggregated results of items most frequently bought together.

Lets keep our focus on Apples and rest of the sibling lineitems will become recommendations for someone who comes looking for Apples:

Any additional sales will add to the number of elements in recorded lineitems:

But this doesn't work because the facet query will report that there is one Orange, one Lemon etc. It will not count Oranges as 2 but rather as 1 inside the lineitems field. This is because the facet query is meant to count the number of documents that have the term "Oranges" but not the number of times a term (such as Oranges) appears per document or field.

What next?

We can break apart the receipt into (total # of lineitems) * (total # of lineitems-1) documents as pairs. This way when a new sale appears, similar pairs can be identified and the facet query can accurately count them because they are all separate/individual documents.

Now we can see that the documents (one lineitem pair per document) from Sale1 and Sale2 can finally be used by a facet query to suggest that Apples are most often bought together with oranges and vice-versa.

Thursday, April 18, 2013

How to invoke update handler with Cradle?

Invoking an update handler via a Cradle nodejs client should be a simple task but I felt like I spent 10 minutes too many, piecing together documentation on it, since its not explicitly documented in the README.md file.

Based on the comments in the pull request where this functionality was added, here's a sample:

Wednesday, April 17, 2013

Wiring Vend POS with CouchDB - Part 2

Tuesday, March 19, 2013

Wiring Vend POS with CouchDB

Motivation:

Vend is a pretty awesome point-of-sale (POS) that resides in the cloud and has an even awesome~er developer API.
You can easily create and query views in CouchDB (common hosts: Cloudant and IrisCouch).
- This can be especially helpful if you want to build towards a Full Text Search capability on top of the Vend APIs.
  - You can use Lucene (Cloudant offers it out-of-the-box), or
  - ElasticSearch (connects to CouchDB via a concept know as "The River")

Steps:

Get all the products from your Vend store and upload them to CouchDB.
- Some standalone code be written to pull all the products and upload them. So many ways to do this, depending on what has the best utilities to cobble together something in a hurry: curl or Java or javascript (run via Node.JS) etc.
  - I'm partial to Node.JS because all of Vend's data is in JSON-format and there are some excellent libraries like Cradle which make it simpler to talk with CouchDB.
- For the upload to CouchDB, it should be possible to leverage the bulk document API and finish it off in one shot.
Going forward, keep everything up to date in real-time by utilizing Vend's webhooks to call into CouchDB's update handlers.
- Create a design document in CouchDB with update handlers that can parse the payload delivered by a Vend webhook like a product update.
  - When writing an update handler for receiving a product update, I mimicked the Stripe Kanso project.
    - I didn't really write the upload script yet. I simply tried to point Vend's product webhook at the URL for my update handler and I was able to get a document populated in CouchDB.
    - For an actual product update (not just a new one), I haven't quite figured out the code, it fails right now. Just need to spend more time reading up on update handlers, I suppose.
      - Because the Vend API sends the update via a payload that is form urlencoded, I was left wondering where to look for the data and how to grab it inside CouchDB's update handler .. but those questions were easily tackled when I found documentation on handling form-style-submission of data with CouchDB.
      - I did send an email to the CouchDB mailing-list to ask for some advice on actual updates:
        1) I have a 3rd-party-webhook API calling into my update handler and there's nothing I can do to make it pass the document ID in the URL.
        2) That means the CouchDB server cannot provide the update function with the most recent version of that document.
        3) But the request does provide a payload from which I can pull out the document ID ... but by this time I'm inside the update handler function.
        4) So my question is: If CouchDB did not provide a doc, is there still a way for me to:
        a) either, fetch the latest version of the doc myself?
        b) or, override the existing document with another document formed with my request payload ... without running into a revision conflict?
        
        I know that I can probably route the request through a proxy that parses the payload, sets the document ID onto the request URL and sends it own its way ... but I'd rather leave that as a last resort.
        
        Thoughts?
      - One more challenge to keep in mind is authentication. If anyone figures out the URL endpoint for the CouchDB and the respective update handler, they could make false additions and updates.
        
        It would be worth looking into any username/password BASIC authn provided by CouchDB to secure the update handler endpoint.
        
        A quick fix would be to provide a unique token (?token=ABCDEF123) as part of the endpoint URL you specify for the webhook (product.update: https://couchdb.com/myDB/_design/doc/_update/productHandler?token=ABCDEF123) and since the query params are also encrypted in a request, your token would travel over a secure channel and then be validated on the CouchDB side to see if it matches what you expected.

Friday, March 1, 2013

Multiline config for Procfile and .env in Foreman

Can we spread out our configuration in a .env file for Foreman across multiple lines?
- At first I thought that this fix to Foreman allowed me break long confugrations across multiple lines the way that I wanted:
  JAVA_OPTS=-Xmx384m -Xss512k \
  -XX:+UseCompressedOops
  MAVEN_OPTS=-Xmx384m -Xss512k -XX:+UseCompressedOops
- But that's not the case as I found out via trial & error. Now I think its simply not supported.
Can we spread out our configuration in a Procfile file for Foreman across multiple lines?
- Nope, trial & error resulted in the same deal as above.
- BUT ... I found a nifty clue here, which allowed me to setup a workaround like this:
  # contents of Procfile
  web:      ./webapp-runner.sh
  
  # contents of webapp-runner.sh
  java $JAVA_OPTS \
           -Djavax.net.ssl.trustStore="cacerts.jks" \
           -jar target/dependency/webapp-runner.jar --port $PORT target/*.war

Thursday, February 28, 2013

Correlating foreman version and features

How do I find out if the version of Foreman installed by Heroku-Toolbelt has the feature that I spotted on Foreman's GitHub website?
- What version are you running?
  $ foreman --version
  0.60.0
- Head on over to the changelog and search the web page with keywords of the feature you're looking for. For example: When was an export for launchd introduced? I found out that it was present here, here & here:
- All of these were mentioned in versions lower than 0.60.0 so I assume that I have it.

Behemoth