Post by Frederik âFresoâ S. Olesen Post by Andre Wiethoff
But in the end you agree that the result of such a web crawl/clustering
algorithm/whatever should be stored in the database as final result (for
speedier access of the results) - if implemented at all? But perhaps we
should discuss at first whether the new data would be beneficial for the
users (or the database)...
In *a* db, sure, in *the* (MB) db, no. It is not objective data and it
would not be user generated. It would be far more reasonable to place it
in another (sub)project. See e.g., AcousticBrainz and CritiqueBrainz for
two MetaBrainz projects expanding on the MusicBrainz data without being
inserted directly into the MB site/data themselves. A
RecommendationBrainz or SimilarityBrainz (or, heck, maybe it could be
part of CritiqueBrainz?) would be a better fit for this.
(Also note that having it in a separate project does not mean it cannot
be used by/on MusicBrainz; e.g., CritiqueBrainz reviews are pulled in
for relevant MB release( group)s.)
I see. So most probably I proposed this to the wrong project?
By that point of view, a recommendation engine (recommendation matrix -
probably a sparse matrix stored in a database) should also not be part
of Musicbrainz, but also a another project extending Musicbrainz, right?
Post by Frederik âFresoâ S. Olesen Post by Andre Wiethoff Post by Frederik âFresoâ S. Olesen
There are already
projects (though I forget which, sorry) that group/cluster entities
based on relationships (which IIRC wasn't completely off), so it is
possible to do something like it with the data in MB already.
I wonder which relationships have been used to group/cluster the
IIRC, all the relationships. The more times two entities linked to each
other, the closer those two entities were. AFAIK, it's a fairly simple
heuristic, but given the amount of relationships in the MB db, it should
give reasonable results for most fairly well known artists.
Post by Andre Wiethoff Post by Frederik âFresoâ S. Olesen
also AcousticBrainz which can be used to cluster entities based on the
acoustic properties of their recordings.
I don't think that clustering regarding the acoustic properties will
bring any good results for now, I guess this will still take ten years
until there exist something that produces results matchable to a human
expert (or even advanced amateur)...
I wouldn't make it stand on its own, no. ABz is still very much in its
infancy and the tools and algorithms in Essentia are not yet up to par
with this massive 2+ million song dataset currently available in the ABz
database. However, ABz can give you ranges about whether a group does
mostly vocal or instrumental things, whether they're mostly high or low
BPM, whether they have a predominant mood, etc.
These aren't necessarily 100% accurate, but combining similarity on
these values with relationship clustering, I think it may be possible to
get some interesting results (e.g., two artists with a lot of
relationships connecting them that additionally does mostly acoustic,
instrumental happy+relaxed music are likely more similar than two
artists with no relationships connecting them and one doing mostly
instrumental and the other doing mostly vocal stuff).
This is where we differ (but of course this depends on the definition of
the term "interesting" ;-)
I don't think that the relationship table will give sufficient
information to really find e.g. artists that are closely related (as
quite often the only the band members are known). Combining it with a
large set of acoustic features, which are only probabilities on how
"similar" two songs regarding a given feature is, will not improve the
result that much. I agree that you would get a list of songs (and by
that artists) which are somewhat similar in the kind of music they make,
but this will not provide a (sorted) list of most similar
So, if the basis data using the relations is not good enough, adding the
acoustic properties will only allow grouping to very large groups like
you mentioned e.g. with/without vocals or fast/slow BPM.
Perhaps we should start with defining "Similarity" first. Here is my try:
Similarity is the probability of a user also liking artist/song/etc. B
if he likes artist/song/etc A.
(this is a user centric view of similarity - of course each individual
user would see it differently how similar two bands are, but this is
only a probability...)
Post by Frederik âFresoâ S. Olesen Post by Andre Wiethoff
Please see the similarity results of the pages for the artist "Herbert
I found the site using only MusicBrainz data for its clustering,
except it isn't using just MusicBrainz data — but it isn't using
http://richseam.com/artist/m/02cskm http://richseam.com/about-us has
slightly more information on what they are doing.
Thanks for the links!
This exactly shows why the relationships wouldn't work out, using the
example of Herbert Grönemeyer (one of germany big ones). The artist
which is so similar that I can't often differ between them is
Westernhagen, which is listed on AllMusic and Amazon as related (BBC
shows only four related artists...). But analysing the connections by
richseam shows artists like John Smith (which doesn't seem to be a real
artist), Charles Aznavour (which is neither very similar, nor even
singing in the same language), Little Axe (Blues!), ..., then somewhen
"Die Fantastischen Vier" show up which are also singing in the same
language, but do HipHop...
At the end there are actually some few who would match a bit, like
Philipp Poisel (using the relation "has played concert with Grönemeyer"
- which would the only relation that would fullfill my definition of
similarity). But there is no sign of Westernhagen at all.
Only because two artists recorded their songs in a specific studio
doesn't make them related...
Post by Frederik âFresoâ S. Olesen
When/if we get access to scrobbles, that's a third data source that can
be added to the mix, but I really do not think we need it to get started
on a similarity/recommendation engine.
Probably I just don't know where to start creating a similarity
algorithm only using the above two feature sets (and my definition of
similarity), but please prove me wrong.
Anyway, doing a recommendation engine based on the mentioned features
will absolutely not be possible (or at least not better than using some
random songs from "similar" artists - however "similar" is defined), as
there are much fewer relations on songs than on artists...
Something completely different: It seems that some audio fingerprints
are misdetected (meaning that one fingerprint has a bunch of results
with high score, but not all of the correct recording). I tested a live
version, but it found also the regular version and one even a cover from
a different group - I assume that either an algorithm has wrongly
assigned the songs metadata to the recording or a user has entered wrong