Discussion:
[mb-devel] Works bot
Daniel Sobey
2015-03-04 13:41:43 UTC
Permalink
Hi list,

I would like to write a bot for musicbrainz that automatically links
recordings to works.
When I add a work to musicbrainz I try and add all the existing recordings
to the work.
This can take time and you cannot keep track of all new recordings that are
added to the database.

What the bot would do is:
1. look through the database for works
2. Look at all the recordings linked to that work and get the Name and list
of artists on the recording
3. Find recordings with the exact same name and list of artist and see if
they already have a recording to work relationship
4. submit an edit to add the recording to work relationship with the same
flags as the reference recording ie cover, live

Rate limiting:
it should not look at works with open edits or are younger than 7 days.
It should not look at recordings with any open edits or are younger than 7
days.
It should see if any of the releases that a recording is linked to has open
edits or is younger than 7 days.
it should not submit too many edits at one time.


Currently this is in the idea stage with no code written yet.
All feedback welcome.

Regards,

Daniel
Nicolás Tamargo de Eguren
2015-03-04 14:16:05 UTC
Permalink
I'm somewhat doubtful this can be made to work well, but I'd be happy to be
proven wrong :)

This should probably try to avoid classical music (where the works and
recordings having the same title is rare anyway, and false positives are
more likely).

Also: "4. submit an edit to add the recording to work relationship with the
same flags as the reference recording ie cover, live" - cover would seem to
make sense, since if it's a cover when an artist performs it it'll probably
always be. But live seems like it should check for "live" in the title or
disambiguation comment or for it being in a live album - because one
recording is live, it doesn't mean all will be :)

Nicolás
Ian McEwen
2015-03-04 21:08:49 UTC
Permalink
Post by Daniel Sobey
Hi list,
I would like to write a bot for musicbrainz that automatically links
recordings to works.
When I add a work to musicbrainz I try and add all the existing recordings
to the work.
This can take time and you cannot keep track of all new recordings that are
added to the database.
I think this would be useful, but please be sure to code conservatively
-- bots are getting more and more common in MB but we definitely don't
want this adding bad relationships that need to be cleaned up.
Post by Daniel Sobey
1. look through the database for works
You may want to internally track how recently you've checked a
particular work to try for reasonable coverage. Internal scripts for
updating coverart from Amazon work somewhat like this, where anything
newly-eligible for consideration is done first, then a limited number of
the least-recently-checked entities.
Post by Daniel Sobey
2. Look at all the recordings linked to that work and get the Name and list
of artists on the recording
3. Find recordings with the exact same name and list of artist and see if
they already have a recording to work relationship
This seems like a good start as far as this step, but also remember that
some transformations can happen. reosarevok's suggestion of avoiding
classical if you can is good (though discerning that can be hard). If
you're working directly from a database (which might be recommended --
the webservice will be quite slow for something like this) then the
musicbrainz_unaccent function (and things like lower(), of course) may
be useful.
Post by Daniel Sobey
4. submit an edit to add the recording to work relationship with the same
flags as the reference recording ie cover, live
it should not look at works with open edits or are younger than 7 days.
It should not look at recordings with any open edits or are younger than 7
days.
It should see if any of the releases that a recording is linked to has open
edits or is younger than 7 days.
it should not submit too many edits at one time.
http://wiki.musicbrainz.org/Code_of_Conduct/Bots will probably be
relevant as far as this stuff. I agree with the open edits and age
requirements for works and recordings as well. 7 days might be overkill
but it's a good conservative place to start.
Post by Daniel Sobey
Currently this is in the idea stage with no code written yet.
All feedback welcome.
Regards,
Daniel
_______________________________________________
MusicBrainz-devel mailing list
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Duke Yin
2015-03-05 01:12:13 UTC
Permalink
Post by Daniel Sobey
2. Look at all the recordings linked to that work and get the Name and
list of artists on the recording
Post by Daniel Sobey
3. Find recordings with the exact same name and list of artist and see if
they already have a recording to work relationship
Post by Daniel Sobey
4. submit an edit to add the recording to work relationship with the same
flags as the reference recording ie cover, live

I hope you don't take my feedback negatively, but this plan is not going to
work. Besides classical, there are (at least) 2 common situations where
this automated plan will fail:

1) artists that are composers of soundtracks, where recording titles may be
identical but the works are actually different. Think of recordings/works
with titles like "battle music" or "opening" or "intermission". (This
problem obviously extends beyond instrumental music.)

Example problems:
http://musicbrainz.org/artist/92bb085a-2924-4479-b627-181a1835d2f5/recordings?filter.artist_credit_id=&filter.name=%E3%83%90%E3%83%88%E3%83%AB1

2) artists that "main" in multiple countries, sing in different languages
depending on the country, but title their songs in the same language - for
example, K-Pop artists that operate in both South Korea and Japan, but the
tracklists do not specify the lyrics language in the titles, and the titles
are identical despite different lyrics. (Let's also assume that you could
not use the artist credits as a "tiebreaker".)
Example problems:
http://musicbrainz.org/artist/e119e5ff-0de0-421c-a630-0516c6acede8/works
http://musicbrainz.org/artist/5e3bc4c7-adbe-40e0-b56e-57d755908d52/recordings

I can't imagine an automatic way of correctly relating the recordings and
works in these two common situations. I'm not sure you're left with much
of value if you actively avoid all the popular known situations where the
recording-work relationship can't be automatically determined.
Post by Daniel Sobey
Post by Daniel Sobey
Hi list,
I would like to write a bot for musicbrainz that automatically links
recordings to works.
When I add a work to musicbrainz I try and add all the existing
recordings
Post by Daniel Sobey
to the work.
This can take time and you cannot keep track of all new recordings that
are
Post by Daniel Sobey
added to the database.
I think this would be useful, but please be sure to code conservatively
-- bots are getting more and more common in MB but we definitely don't
want this adding bad relationships that need to be cleaned up.
Post by Daniel Sobey
1. look through the database for works
You may want to internally track how recently you've checked a
particular work to try for reasonable coverage. Internal scripts for
updating coverart from Amazon work somewhat like this, where anything
newly-eligible for consideration is done first, then a limited number of
the least-recently-checked entities.
Post by Daniel Sobey
2. Look at all the recordings linked to that work and get the Name and
list
Post by Daniel Sobey
of artists on the recording
3. Find recordings with the exact same name and list of artist and see if
they already have a recording to work relationship
This seems like a good start as far as this step, but also remember that
some transformations can happen. reosarevok's suggestion of avoiding
classical if you can is good (though discerning that can be hard). If
you're working directly from a database (which might be recommended --
the webservice will be quite slow for something like this) then the
musicbrainz_unaccent function (and things like lower(), of course) may
be useful.
Post by Daniel Sobey
4. submit an edit to add the recording to work relationship with the same
flags as the reference recording ie cover, live
it should not look at works with open edits or are younger than 7 days.
It should not look at recordings with any open edits or are younger
than 7
Post by Daniel Sobey
days.
It should see if any of the releases that a recording is linked to has
open
Post by Daniel Sobey
edits or is younger than 7 days.
it should not submit too many edits at one time.
http://wiki.musicbrainz.org/Code_of_Conduct/Bots will probably be
relevant as far as this stuff. I agree with the open edits and age
requirements for works and recordings as well. 7 days might be overkill
but it's a good conservative place to start.
Post by Daniel Sobey
Currently this is in the idea stage with no code written yet.
All feedback welcome.
Regards,
Daniel
_______________________________________________
MusicBrainz-devel mailing list
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
_______________________________________________
MusicBrainz-devel mailing list
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Jesse W
2015-03-05 17:20:40 UTC
Permalink
I suggest that writing a program to *analyse* the scope of the work,
first, would be useful. Download the database, write a bunch of queries
to identify possible linkages, (and divide them into lots of
sub-categories (like the presence of tags, presence of various external
identifiers like Wikidata or other IDs, number of identical artists,
track names, works, etc.)) and post your results. Then we'll have more
detail to think about a bot to do actual updates.

It certainly seems like something worth investigating further, though!

Jesse
Post by Daniel Sobey
Hi list,
I would like to write a bot for musicbrainz that automatically links
recordings to works.
When I add a work to musicbrainz I try and add all the existing
recordings to the work.
This can take time and you cannot keep track of all new recordings
that are added to the database.
1. look through the database for works
2. Look at all the recordings linked to that work and get the Name and
list of artists on the recording
3. Find recordings with the exact same name and list of artist and see
if they already have a recording to work relationship
4. submit an edit to add the recording to work relationship with the
same flags as the reference recording ie cover, live
it should not look at works with open edits or are younger than 7 days.
It should not look at recordings with any open edits or are younger
than 7 days.
It should see if any of the releases that a recording is linked to has
open edits or is younger than 7 days.
it should not submit too many edits at one time.
Currently this is in the idea stage with no code written yet.
All feedback welcome.
Regards,
Daniel
_______________________________________________
MusicBrainz-devel mailing list
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Loading...