Wednesday, October 5, 2011

Import Dynamic Fields from XML into Solr via DIH

Given an XML file that needs to be imported into Solr, you may often run into some uncommon data values that would be:
  • best grouped together under the banner of some dynamic field defined in schema.xml file,
  • with their mapping left up to the discretion of an admin tweaking the data-config.xml file, just before running DIH.

Off the cuff, one may be at a loss on how exactly to accomplish this ... and in doubt if it can even be done! You've seen this done for databases with Data Import Handler (DIH) but not with the XPath handler, or URL datasource, or File datasource for XML.

Well it can be done and here's an example:
  1. Lets say your XML file looks something like this:
    
      
        hammer
        tough and durable
        heavy
        2 inches
      
      
        nail
        sharp and thin
        hazard
        1 inch
      
    
    
  2. Now disposition, width, dangerLevel, height are pieces of data that you may not be able to plan ahead for, in your schema.xml file. So instead it makes sense to leave some wiggle room by defining somewhat predictable dynamic fields like so:
    
    
    
    Keep in mind that you have some flexibility and responsibility here in terms of choosing the type of the dynamic field ahead of time.
  3. Now when your customers or customer-facing-admins who are handling the data-import.xml file and will be looking to kick-off DIH against an XML file that they know best ... it will be quite an easy for them to come up with something like the following on the spot:
    
    
    
    
    
    and still have an agreed upon well-oiled working index at the end of the day.

0 comments:

Post a Comment