ujjwal wahi
2015-03-23 20:58:23 UTC
Hello mb developers,
Here is my GSoC proposal.
Title: Finish implementation of SOLR Search
** Abstract
Last year Wieland Hoffmann started working on Apache Solr based search
infrastructure. My task this year will be to make Solr based search
production ready and finally to replace the existing search with the Solr
based search.
** Content
* Personal Details
Name: Ujjwal Wahi
Email: ***@gmail.com
IRC Nick: ujjwal
Github: http://github.com/ujjwalwahi
MusicBrainz Profile: https://musicbrainz.org/user/ujjwalwahi/
Location (City, Country and Time Zone): Delhi, India (UTC/GMT +5:30 hours)
* Introduction
I propose to complete Apache Solr based search server and to deploy it to
the production replacing the existing search server.
* Project goal
The goal of this project is to make the Apache Solr based search server
production ready for use and replace the existing search server with it.
Following are the subtasks needed to achieve this goal:
1) The current search server applies boosting techniques, for example, it
boosts areas such as entire countries so that the countries are listed
before the areas of the same name that are not countries. Similar boosting
techniques are required to be applied to the Solr based search server.
Following is the related github issue:
# Port the boosting configuration[1]
2) Current Solr implementation uses the following basic analyzer :
<fieldType name="lowercasestring" class="solr.TextField"
sortMissingLast="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Whereas the existing search server has some advance analyzing techniques,
for example, it treats '&' and 'and' interchangeably. The existing search
serverâs analysis needs to be ported. Following is the related github issue:
# Port the analysis configuration[2]
3) An automated script to import data to Solr.
4) After the above listed features are implemented, Solr based search
server needs to be deployed to the test server, where community can test
and find bugs. This test server can be any existing test server available
in MusicBrainz or I can host it on Amazon Cloud Service.
5) Fixing bugs that the community found while using the search server
deployed to the test server.
6) Deploy the Solr based search server in production replacing the existing
search server. This invloves installing Solr and required tools on server
and pointing musicbrainz server's search to Solr.
I propose to deploy the current implementation of the Solr based server to
a test server during the community bounding period or before that so that
the community members are able to see what has already been done.
* Timeline
I am going to devote at least 8 to 9 hours per day to this project. I have
no other commitments during this summer.
Mar 28 â Apr 27: (4 week) Read more about Lucene, Solr and boosting and
analysis in Solr. Brush up my Java and Python skills.
Community Bounding Period (4 Weeks)
# Study boosting, analysis and tweaks in the existing search server.
# Discussions with the mentor and community,
Week 1(May 25)
Start porting boosting configuration.
Week 2(Jun 1)
Start porting analysis configuration.
Week 3(Jun 8)
Continue adding missing features.
Week 4(Jun 15)
Continue adding missing features and write a script in Python to automate
the indexing of the data.
Week 5(Jun 22)
Deploy the project to the test server ( This would be a milestone for
mid-term evaluation.).
Week 6(Jun 28) Mid term evaluation and a short break.
Week 7-10(Jul 6-Aug 2)
Fixing the bugs identified by the community after using the solr search
server on try server.
Week 11(Aug 3)
Deploy Solr search server to production replacing the existing search
server.
Week 12(Aug 10)
Work on the remaining bugs,if any.
Week 13(Aug 17)
Improve the code, documentation and tests.
Note that this time line is tentative.
* Deliverables
Expected outcomes of this project are:
# To make Solr based search at par with the existing search server , by
including missing features like boosting, analysis and tweaks.
# Replace the existing search server with the Solr based search server.
# An automated script to index and re-index the data.
* Why Me ?
For the preparation for this project I have successfully installed
Musicbrainz server, Musicbrainz search server and Solr based search server
on local system. I have successfully imported the data to Solr instance
using sir library. Now I have a local Musicbrainz server which is using
Solr as a search server. I have read most of the last year's discussions on
Solr based search server and now I have a good understanding of the
requirement.
* About Me
Pursuing Master of Computer Applications (MCA) ,Ist year, at Bharati
Vidyapeeth's Institute of Computer Applications and Management(BVICAM), New
Delhi, India. I can code in Java, Python, C#, PHP, Python, Java script.
Experience:
1. Worked for Mozilla's Testing and Automation team. Links to some issues I
have worked upon are [3] and [4]
2. Have experience in developing and in the deployment of large scale
projects like a B2B portal [5] and an e-commerce site [6]. Coding was done
in PHP, Java script.
3. Developed a system for sending alerts, promotional and service renewal
mails to customers for [7], a B2B portal .
4. Developed an Online Ticket Booking System project using Java as part of
the completion of the Bachelor's Degree.
5. Used Lucene.NET based tool Arachnode.net to extract and list information
from government site, extracted information is being displayed at [8].
Please provide your valuable feedbacks.
[1] https://github.com/mineo/mbsssss/issues/2
[2] https://github.com/mineo/mbsssss/issues/1
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1117190
[4] https://github.com/mozilla/mozdownload/pull/234
[5] http://www.ficci-b2b.com/
[6] http://www.bazara2z.com/
[7] http://www.infobanc.com
[8] http://www.infobanc.com/policy-tracker
--
Regards,
Ujjwal Wahi
Here is my GSoC proposal.
Title: Finish implementation of SOLR Search
** Abstract
Last year Wieland Hoffmann started working on Apache Solr based search
infrastructure. My task this year will be to make Solr based search
production ready and finally to replace the existing search with the Solr
based search.
** Content
* Personal Details
Name: Ujjwal Wahi
Email: ***@gmail.com
IRC Nick: ujjwal
Github: http://github.com/ujjwalwahi
MusicBrainz Profile: https://musicbrainz.org/user/ujjwalwahi/
Location (City, Country and Time Zone): Delhi, India (UTC/GMT +5:30 hours)
* Introduction
I propose to complete Apache Solr based search server and to deploy it to
the production replacing the existing search server.
* Project goal
The goal of this project is to make the Apache Solr based search server
production ready for use and replace the existing search server with it.
Following are the subtasks needed to achieve this goal:
1) The current search server applies boosting techniques, for example, it
boosts areas such as entire countries so that the countries are listed
before the areas of the same name that are not countries. Similar boosting
techniques are required to be applied to the Solr based search server.
Following is the related github issue:
# Port the boosting configuration[1]
2) Current Solr implementation uses the following basic analyzer :
<fieldType name="lowercasestring" class="solr.TextField"
sortMissingLast="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Whereas the existing search server has some advance analyzing techniques,
for example, it treats '&' and 'and' interchangeably. The existing search
serverâs analysis needs to be ported. Following is the related github issue:
# Port the analysis configuration[2]
3) An automated script to import data to Solr.
4) After the above listed features are implemented, Solr based search
server needs to be deployed to the test server, where community can test
and find bugs. This test server can be any existing test server available
in MusicBrainz or I can host it on Amazon Cloud Service.
5) Fixing bugs that the community found while using the search server
deployed to the test server.
6) Deploy the Solr based search server in production replacing the existing
search server. This invloves installing Solr and required tools on server
and pointing musicbrainz server's search to Solr.
I propose to deploy the current implementation of the Solr based server to
a test server during the community bounding period or before that so that
the community members are able to see what has already been done.
* Timeline
I am going to devote at least 8 to 9 hours per day to this project. I have
no other commitments during this summer.
Mar 28 â Apr 27: (4 week) Read more about Lucene, Solr and boosting and
analysis in Solr. Brush up my Java and Python skills.
Community Bounding Period (4 Weeks)
# Study boosting, analysis and tweaks in the existing search server.
# Discussions with the mentor and community,
Week 1(May 25)
Start porting boosting configuration.
Week 2(Jun 1)
Start porting analysis configuration.
Week 3(Jun 8)
Continue adding missing features.
Week 4(Jun 15)
Continue adding missing features and write a script in Python to automate
the indexing of the data.
Week 5(Jun 22)
Deploy the project to the test server ( This would be a milestone for
mid-term evaluation.).
Week 6(Jun 28) Mid term evaluation and a short break.
Week 7-10(Jul 6-Aug 2)
Fixing the bugs identified by the community after using the solr search
server on try server.
Week 11(Aug 3)
Deploy Solr search server to production replacing the existing search
server.
Week 12(Aug 10)
Work on the remaining bugs,if any.
Week 13(Aug 17)
Improve the code, documentation and tests.
Note that this time line is tentative.
* Deliverables
Expected outcomes of this project are:
# To make Solr based search at par with the existing search server , by
including missing features like boosting, analysis and tweaks.
# Replace the existing search server with the Solr based search server.
# An automated script to index and re-index the data.
* Why Me ?
For the preparation for this project I have successfully installed
Musicbrainz server, Musicbrainz search server and Solr based search server
on local system. I have successfully imported the data to Solr instance
using sir library. Now I have a local Musicbrainz server which is using
Solr as a search server. I have read most of the last year's discussions on
Solr based search server and now I have a good understanding of the
requirement.
* About Me
Pursuing Master of Computer Applications (MCA) ,Ist year, at Bharati
Vidyapeeth's Institute of Computer Applications and Management(BVICAM), New
Delhi, India. I can code in Java, Python, C#, PHP, Python, Java script.
Experience:
1. Worked for Mozilla's Testing and Automation team. Links to some issues I
have worked upon are [3] and [4]
2. Have experience in developing and in the deployment of large scale
projects like a B2B portal [5] and an e-commerce site [6]. Coding was done
in PHP, Java script.
3. Developed a system for sending alerts, promotional and service renewal
mails to customers for [7], a B2B portal .
4. Developed an Online Ticket Booking System project using Java as part of
the completion of the Bachelor's Degree.
5. Used Lucene.NET based tool Arachnode.net to extract and list information
from government site, extracted information is being displayed at [8].
Please provide your valuable feedbacks.
[1] https://github.com/mineo/mbsssss/issues/2
[2] https://github.com/mineo/mbsssss/issues/1
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1117190
[4] https://github.com/mozilla/mozdownload/pull/234
[5] http://www.ficci-b2b.com/
[6] http://www.bazara2z.com/
[7] http://www.infobanc.com
[8] http://www.infobanc.com/policy-tracker
--
Regards,
Ujjwal Wahi