Reading
Chapter 9: Scaling Solr from the book
Solr 1.4 Enterprise Search Server before jumping into the world of
Solr Cloud is essential for anyone who wants to understand what the embedded ZooKeeper can or cannot do. This is because you have to know how to configure al the nuts & bolts in Solr manually before you can gain a natural understanding of what the automation does and does not take care of for you.
If you go through the basic exercises for
Solr Cloud, then you will come across
Example B: Simple two shard cluster with shard replicas. It is important to note that the wording here can be a bit misleading based on what you are looking to accomplish. It is not
replication that is being set up there. Instead, that example uses "replicas" as "copies", to demonstrate high search availability.
Here are the tested & tried steps for
replication with a master-slave setup that will fit-in with a ZooKeeper managed Solr Cloud:
- If you've already done some work with Solr Cloud then you may want to start fresh by cleaning up any previous ZooKeeper configuration data in order to run this example exercise smoothly.
cd /trunk/solr/example/solr
rm -rf zoo_data
- Collection is a ZooKeeper oriented terminology to indicate a bunch of Solr cores that share the same schema and this has nothing to do with the name of a Solr Core itself. Lets keep this fact plain to see by editing the solr.xml file and providing an appropriate name for the core & collection:
<cores adminPath="/admin/cores" defaultCoreName="master1">
<core name="master1" instanceDir="." shard="shard1" collection="collection1"></core>
</cores>
- Navigate to the configuration directory for the example in the trunk & begin editing solrconfig.xml using your preferred text-editor:
cd /trunk/solr/example/solr/conf
vi solrconfig.xml
- Uncomment and edit the replication requestHandler to be as follows:
<requesthandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="enable">${enable.master:false}</str>
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">schema.xml,stopwords.txt</str>
</lst>
<lst name="slave">
<str name="enable">${enable.slave:false}</str>
<str name="masterUrl">http://localhost:8983/solr/replication</str>
<str name="pollInterval">00:00:60</str>
</lst>
</requestHandler>
- Navigate out of the examples directory and create another copy of it
cd /trunk/solr/
cp -r example example2
- Edit the solr.xml file for the example2 directory:
- change the name of the core to indicate that it is a slave
- leave the name of the shard as-is to indicate which shard it is a replica of
- leave the name of the collection as-is because this slave core should join the same collection as its master in ZooKeeper config
cd /trunk/solr/example2/solr
vi solr.xml
<cores adminPath="/admin/cores" defaultCoreName="slave1">
<core name="slave1" instanceDir="." shard="shard1" collection="collection1"></core>
</cores>
- Start the master core, the use of java params allows us to call this out as a master at startup:
cd /trunk/solr/example
java -Dbootstrap_confdir=./solr/conf -Denable.master=true -DzkRun -jar start.jar
- Start the slave core, the use of java params allows us to call this out as a slave at startup:
cd /trunk/solr/example2
java -Djetty.port=7574 -DhostPort=7574 -Denable.slave=true -DzkHost=localhost:9983 -jar start.jar
- After starting the slave, towards the end of the logs for the slave, you should be able to spot info to affirm that replication is working:
INFO: Updating cloud state from ZooKeeper...
Sep 9, 2011 6:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sources:
- http://lucene.472066.n3.nabble.com/Solr-Cloud-is-replication-really-a-feature-on-the-trunk-td3317695.html
- http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html