Zemberek is a very useful Turkish NLP module. iorixxx made and manages a very useful Zemberek plugin for Apache Lucene and Solr. However i found installation instructions are somewhat confusing or outdated and spent some time to install the plugin properly. Here is how to install Zemberek to Solr 9.4.
- First you’ll need to create .jar files
git clone https://github.com/iorixxx/lucene-solr-analysis-turkish.git
cd lucene-solr-analysis-turkish
mvn clean package
- Find all the .jar files from lucene-solr-analysis-turkish folder.
- Create a folder under [solr installation directory]/modules. (For example i use Mac with Homebrew and create “zemberek/lib” folder under “/usr/local/Cellar/solr/9.4.0/modules/”) Then copy all the .jar files to the the newly created folder.
- Then modify your solrconfig.xml in the solrHome/[core name] directory. And add a line like this.
<lib dir="${solr.install.dir:../../../..}/modules/zemberek/lib/" regex=".*\.jar" />
- Now you can use Zemberek In your [core name] schema like this.
<filter class="org.apache.lucene.analysis.tr.Zemberek3StemFilterFactory" strategy="maxLength"/>
For more info: https://github.com/iorixxx/lucene-solr-analysis-turkish#turkishanalyzer-for-solr-users