[mb-devel] Weeksio's Progress this week
(too old to reply)
Jeff Weeks
2015-06-29 19:00:48 UTC
Progress this week:

-Boosted Solr schema.xml version from 1.2 to 1.5.
-Edited sir to reflect
-I changed direction on how I'm porting the analysis from the old server to
the new. Instead of using the old java ported as is to Solr, I'm
determining the steps each analysis takes and defining those steps directly
in schema.xml. This should make future maintenance much easier.
-I've been making my way through each core deciding which of the new field
types to use for each attribute, importing each core's data from the
database, fixing errors as they pop up, trying to improve the indexing and
think into the future a bit as to how we will utilize some of Solr's
features. I'm though Annotation, Area, Artist, CDStub, Editor...and the
further I go the quicker it goes as each one covers more cases likely to
appear in future fields as I go.

-Porting the Area, Artist, Label Boost configurations
-Add remaining entities to sir
-Continue working through cores


I'm still not sure why we need this...maybe I'm overlooking something. It
looks like tokens aren't stored, only analyzed/indexed; but if we use the
same analysis at query time as at index time (which it appears we do) the
indexed tokens retaining their accents will never be accessed. ...correct?

-_store fields:
I really don't understand the purpose of these...could someone explain?

-[SEARCH-371] <http://tickets.musicbrainz.org/browse/SEARCH-371>

Would an ICUTransformFilter using Greek-Latin and Cyrillic-Latin rule sets
do the trick? ...or are resources the primary concern?

-I noticed our analyzers don't use stop words? Is this something to
continue? Seems like a good conservative list of English stop words would
be useful.

Personal Criticism:
-I've been really bad about getting my code up on github...hope to get that
up today. I just haven't gotten into the habit of it....I'm so used to
working by myself on school projects.

Personal Observations:
-Solr is awesome! It's super powerful...and it's really easy to get bogged
down thinking about adding extras.