Behemoth: Setup Solr master-slave replication with ZooKeeper

Friday, September 9, 2011

Setup Solr master-slave replication with ZooKeeper

Reading Chapter 9: Scaling Solr from the book Solr 1.4 Enterprise Search Server before jumping into the world of Solr Cloud is essential for anyone who wants to understand what the embedded ZooKeeper can or cannot do. This is because you have to know how to configure al the nuts & bolts in Solr manually before you can gain a natural understanding of what the automation does and does not take care of for you.

If you go through the basic exercises for Solr Cloud, then you will come across Example B: Simple two shard cluster with shard replicas. It is important to note that the wording here can be a bit misleading based on what you are looking to accomplish. It is not replication that is being set up there. Instead, that example uses "replicas" as "copies", to demonstrate high search availability.

Here are the tested & tried steps for replication with a master-slave setup that will fit-in with a ZooKeeper managed Solr Cloud:

If you've already done some work with Solr Cloud then you may want to start fresh by cleaning up any previous ZooKeeper configuration data in order to run this example exercise smoothly.
```
cd /trunk/solr/example/solr
rm -rf zoo_data
```
Collection is a ZooKeeper oriented terminology to indicate a bunch of Solr cores that share the same schema and this has nothing to do with the name of a Solr Core itself. Lets keep this fact plain to see by editing the solr.xml file and providing an appropriate name for the core & collection:
```
<cores adminPath="/admin/cores" defaultCoreName="master1">
 <core name="master1" instanceDir="." shard="shard1" collection="collection1"></core>
</cores>
```
Navigate to the configuration directory for the example in the trunk & begin editing solrconfig.xml using your preferred text-editor:
```
cd /trunk/solr/example/solr/conf
vi solrconfig.xml
```

Uncomment and edit the replication requestHandler to be as follows:

<requesthandler name="/replication" class="solr.ReplicationHandler" >
       <lst name="master">
         <str name="enable">${enable.master:false}</str>
         <str name="replicateAfter">commit</str>
         <str name="replicateAfter">startup</str>
         <str name="confFiles">schema.xml,stopwords.txt</str>
       </lst>
       <lst name="slave">
         <str name="enable">${enable.slave:false}</str>
         <str name="masterUrl">http://localhost:8983/solr/replication</str>
         <str name="pollInterval">00:00:60</str>
       </lst>
</requestHandler>

Navigate out of the examples directory and create another copy of it
```
cd /trunk/solr/
cp -r example example2
```
Edit the solr.xml file for the example2 directory:
1. change the name of the core to indicate that it is a slave
2. leave the name of the shard as-is to indicate which shard it is a replica of
3. leave the name of the collection as-is because this slave core should join the same collection as its master in ZooKeeper config
```
cd /trunk/solr/example2/solr
vi solr.xml
```
```
<cores adminPath="/admin/cores" defaultCoreName="slave1">
 <core name="slave1" instanceDir="." shard="shard1" collection="collection1"></core>
</cores>
```

Start the master core, the use of java params allows us to call this out as a master at startup:

cd /trunk/solr/example
java -Dbootstrap_confdir=./solr/conf -Denable.master=true -DzkRun -jar start.jar

Start the slave core, the use of java params allows us to call this out as a slave at startup:

cd /trunk/solr/example2
java -Djetty.port=7574 -DhostPort=7574 -Denable.slave=true -DzkHost=localhost:9983 -jar start.jar

After starting the slave, towards the end of the logs for the slave, you should be able to spot info to affirm that replication is working:

INFO: Updating cloud state from ZooKeeper...
Sep 9, 2011 6:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.

Sources:

http://lucene.472066.n3.nabble.com/Solr-Cloud-is-replication-really-a-feature-on-the-trunk-td3317695.html
http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html

10 comments:

DoctorNovember 3, 2011 at 2:03 PM
Changes to solr.xml (issues with case-ing)

instancedir should be instanceDir ..(d should be in CAPS)

similarly,

adminpath ==> adminPath
defaultcorename ==> defaultCoreName ..

and btw .. ur post was really helpful ..

thanks,
doc
ReplyDelete
Replies
DoctorNovember 3, 2011 at 4:56 PM
i m getting below error ... when slave tries to pull data from server ..

SEVERE: org.apache.solr.common.cloud.ZooKeeperException: ZkSolrResourceLoader does not support getConfigDir() - likely, what you are trying to do is not supported in ZooKeeper mode

do you have any idea about this issue ??
ReplyDelete
Replies
TiklupNovember 3, 2011 at 5:47 PM
No I haven't run into this error before, I would highly recommend subscribing to and then posting to the Solr Mailing List.
I had written this blog as I was going through the steps myself as a way of keeping notes and I was using whatever the trunk had back then ... so whatever was available around Friday, September 9, 2011 is the source code that I ran this with without any issues.
ReplyDelete
Replies
TiklupNovember 4, 2011 at 6:08 AM
Fixed the source miss-capitalization issue so feel free to copy-paste from the blog.
How did I do this?
1) Post Option > Show HTML literally
2) Compose Mode (cut & paste from Edit HTML Mode)
ReplyDelete
Replies
DoctorNovember 7, 2011 at 4:26 PM
hi pulkit,
i want one help from you ..
can you download latest truck and try the steps
Replication is not working for me in SolrCloud.. i m not sure whats the problem ..
ReplyDelete
Replies
TiklupNovember 8, 2011 at 6:56 AM
1) Looks like you posted to Solr-Users without subscribing and accepting membership to the mailing list first.
2) This caused your question to initially lie around in moderation and when approved, your entire question's text got garbled!
3) Please try posting again as I am working on ElasticSearch research these days and don't have the heart to fire-up Solr again until I'm finished with this. If I come out of ES soon, then I'll try to help.
ReplyDelete
Replies
AnonymousNovember 25, 2011 at 3:58 AM
@Doctor.

I got the same exception: ZkSolrResourceLoader does not support getConfigDir()

I am only learning ZK and SolrCloud (Greatly helped by the excellent article) but if you remove:

schema.xml,stopwords.txt

Then index replication works fine. I think that the idea is that the conf files are now managed by ZooKeeper and therefore replicating them doesn't seem to make sense. (Although this is just a guess at this point).
ReplyDelete
Replies
AnonymousDecember 12, 2011 at 10:58 PM
I am happy to found your blogs.It is really great.I learnt new things from your website.Please update all information.I am waiting to learn more information.

Joomla developer
ReplyDelete
Replies
TiklupDecember 16, 2011 at 10:12 AM
@sathya what would you like updated?
ReplyDelete
Replies
GIrin November 27, 2012 at 5:56 AM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment

Behemoth

Friday, September 9, 2011

Setup Solr master-slave replication with ZooKeeper

10 comments:

Total Pageviews

Blog Archive