Monday, October 31, 2011

Hacking Weebly: PowWeb's Drag & Drop Builder

Background

  • What led to the Weebly hack-a-thon?
  • What had kept me from using Weebly in the first place?
  • What keeps my love & "ughhh" relationship with Weebly going?

Lets Hack!

  • How can images be replaced if the Weebly template does not allow it?
  • How can expandable/collapsible (+/-) areas be added?

Thursday, October 20, 2011

HTML5 vs. Native Mobile Apps

Technologies that are enabling HTML5 to either deploy to multiple mobile platforms or keep-up with the native look:
  • PhoneGap
  • Sencha Touch
  • jqTouch
  • jQuery Mobile
    • There is a minor issue in iOS that doesn't properly set the width when changing orientations with these viewport settings, but this will hopefully be fixed a a future release.
    • It's not currently possible to deep link to an anchor (index.html#foo) on a page.

Monday, October 17, 2011

ElasticSearch and CouchDB: Match made in heaven?

  • In ElasticSearch (ES):
    each indexed document is given a version number. This version number can be supplemented with an external value (for example, if maintained in a database). To enable this functionality, version_type should be set to external.
    Sounds nice, primed for CouchDB, right? But:
    The value provided must be a numeric, long value greater than 0, and less than around 9.2e+18.
    The was CouchDB does versioning, it isn't numeric because it appends two numbers to create a sequence/version, for example:
    1-1234567890
    So how is this handled in case of a CouchDB stream for ES?
  • How does ES facilitate the generation of an "_id" based on the data in the incoming document?
    • Does it allow to take the value of a field of that document? For example, there is a way to perform "_routing" via one of the incoming document's fields for distributed indexing across shards. So what about something for picking out the id?
    • Does it allow to concatenate values of multiple fields of that document?
  • Same question as the one above for CouchDB.
  • TBD

Sunday, October 9, 2011

Scalability Madness

CouchDB CouchDB-Lucene CouchIO / CouchOne / CouchBase Solr Elastic Search MongoDB BigCouch
Full-Text SearchNoYes?YesYesMay Be with Photovoltaic?
Distribute-ableYes??YesYes??
Distribute-ed???NoYes??
Schema-lessYes???Yes??
Tools for Importing Data???Yes???
Comments?Does it go Toe to Toe against all the features of Lucene exposed by Solr? best for HTML5 dev? it is distributable but not distributed. SolrCloud has very few features Compass got punted to invent Elastic Search.??

Wednesday, October 5, 2011

Splitting up large XML data files for use with DIH in Solr

It is ridiculously beneficial to split up XML files if you will be using Solr's Data Import Handler (DIH) to process the data. I personally saw an improvement from a speed of 166 entries/minute to 30860 entries/minute after splitting up all the large XML data files into an individual file for every entity that is to become a lucene document in Solr.

It was only on a whim but the script that allowed me to experiment with this and yield the desired results was found here:
awk '/<item>/{close("row"count".xml");count++}count{f="row"count".xml";print $0 > f}' *.xml

So if your file looks something like:

  
    Item 1
    Description 1
    ...
  
  ...
  
    Item 20000
    Description 20000
    ...
  


Then all the items from 1 to 19,999 will be divided up by this script into idividual files named row1.xml, row2.xml ... row19999.xml and look like:

  Item N
  Description N
  ...

But the last (20,000-th) item will have a trailing tag:
  <item>
    <title>Item 20000</title>
    <description>Description 20000</description>
    ...
  </item>
</items>

If you have processed 10 files, each with 20000 entries using the splitter command mentioned above ... then basically every 20000, 20000*2 ... 20000*10 numbered file will need to have the trailing tag deleted from it. To that end, the following script can be edited by providing the # of original files in the while loop's comparison statement:
#!/bin/sh
if [ $# -eq 0 ]
then
  echo "Error - Number missing form command line argument"
  echo "Syntax : $0 number"
  echo " Use to print multiplication table for given number"
exit 1
fi
n=$1
i=1
while [ $i -le 10 ]
do
  echo "sed -ibak '/items>/d' row`expr $i \* $n`.xml"
  sed -ibak '/items>/d' row`expr $i \* $n`.xml
  i=`expr $i + 1`
done
And then running the script by passing it the # of the last entry (20000-th):
./sanitize.sh 20000
sed -ibak '/items>/d' row20000.xml
sed -ibak '/items>/d' row40000.xml
sed -ibak '/items>/d' row60000.xml
sed -ibak '/items>/d' row80000.xml
sed -ibak '/items>/d' row100000.xml
sed -ibak '/items>/d' row120000.xml
sed -ibak '/items>/d' row140000.xml
sed -ibak '/items>/d' row160000.xml
sed -ibak '/items>/d' row180000.xml
sed -ibak '/items>/d' row200000.xml

Import Dynamic Fields from XML into Solr via DIH

Given an XML file that needs to be imported into Solr, you may often run into some uncommon data values that would be:
  • best grouped together under the banner of some dynamic field defined in schema.xml file,
  • with their mapping left up to the discretion of an admin tweaking the data-config.xml file, just before running DIH.

Off the cuff, one may be at a loss on how exactly to accomplish this ... and in doubt if it can even be done! You've seen this done for databases with Data Import Handler (DIH) but not with the XPath handler, or URL datasource, or File datasource for XML.

Well it can be done and here's an example:
  1. Lets say your XML file looks something like this:
    
      
        hammer
        tough and durable
        heavy
        2 inches
      
      
        nail
        sharp and thin
        hazard
        1 inch
      
    
    
  2. Now disposition, width, dangerLevel, height are pieces of data that you may not be able to plan ahead for, in your schema.xml file. So instead it makes sense to leave some wiggle room by defining somewhat predictable dynamic fields like so:
    
    
    
    Keep in mind that you have some flexibility and responsibility here in terms of choosing the type of the dynamic field ahead of time.
  3. Now when your customers or customer-facing-admins who are handling the data-import.xml file and will be looking to kick-off DIH against an XML file that they know best ... it will be quite an easy for them to come up with something like the following on the spot:
    
    
    
    
    
    and still have an agreed upon well-oiled working index at the end of the day.

Monday, October 3, 2011

Embedding Videos in Joomla 1.7

  1. Log in as the administrator
  2. Hover over the Extensions drop-down and click on Extension Manager
  3. In the Install from URL section, paste the URL pointing to a zip file for the AllVideos Joomla plugin.
    • http://joomlaworks.googlecode.com/files/plg_jw_allvideos-v4.0_j1.5-1.7.zip
  4. Once you see a notification on screen for a successful installation, click on the Manage tab
  5. Locate the row that lists the AllVideos plugin and click on the red status icon in order to toggle it to enabled.
  6. Hover over the Extensions drop-down and click on Plugin Manager
  7. Locate the row that lists the AllVideos plugin and click on the title of the plugin itself.
  8. Configure the plugin based on your needs.

Sunday, October 2, 2011

Screencast Toolset

Best toolset that I've found for working on the Mac with screencasts:
  1. ScreenR
  2. for recording.
  3. Jing for recording.
  4. SimpleMovieX for merging.
    1. The videos merged using this tool will not work as intended on Vimeo or YouTube. They will stop at the very first location that was stitched together.
  5. Final Cut Pro for merging.
    1. The videos merged using this tool can be seamlessly uploaded to top providers like YouTube and everything in the video works as intended. But the content may show up as Public by default! So make sure to secure your content afterwards.