Discussion:
[mb-devel] GSoC 2015 AcousticBrainz Ideas/Questions
Daniel Deng
2015-03-26 10:43:31 UTC
Permalink
Hi,

I am interested in the data exploration idea for AcousticBrainz and have drafted some ideas and questions on how I would go about it. I would like to know if it seems like I have the right idea of what this project actually entails.

Thanks,

Daniel Deng

Summary:


Create an application that allows users to select music and generate various charts from a selection of filters, orderings, and groupings.

This project will consist of 3 main tasks.

1. Generate a database that combines relevant information from MusicBrainz and AcousticBrainz.

2. Create a database application that manages the ‘Library’ of music that the user has selected, returns the data needed for the chart requested by the user, and allows for updates by some administrator.

3. Create a front-end that interacts with the database application

A couple of questions I have are-
How will this be integrated within the current MusicBrainz ecosystem? Will it be a desktop application like Picard or will it be web-based?


How is the AcousticBrainz data currently stored? Is stored using some DBMS or is it just a collection of JSON files currently?


Additional details if interested:

Database

The database will consist of three entities: tracks, artists, and releases. All of these will be associated with the corresponding MBID on MusicBrainz. The purpose of having an artist and release entity is to make it easier to select all the tracks associated with one of them. Each track will contain the ‘summary’ information from the low-level data, such as average_loudness and key_strength, and all the categorical values from the high-level data (probabilities will be excluded).

User Application

There will be three main steps to make a chart: creating the library of tracks to explore, processing the library, and choosing the desired chart.

First, the user will create the library by entering search strings for artists, releases, tracks, and tags. If this query is ambiguous, the user will be prompted to select among a list of possibilities. Alternatively, the user can just enter the MBID corresponding to what he wants. When an artist, release, or tag is entered, all the tracks corresponding to it are added to the library. Selecting the entire collection of tracks will also be possible.

Next, four processors are provided for the user. They are filtering, ordering, grouping, and attribute of interest. For filtering, the user can have tracks with specified attributes to be removed. These attributes can be combined with AND or OR operators. For ordering, the user can specify some attribute to order the tracks by. For grouping, the user can specify some attribute that splits up the remaining tracks. For attribute of interest, the user can specify what attribute he wants to place on the chart (this could just be a count of occurrences).

Finally, the desired chart is selected and some options could be a pie chart, bar graph, or histogram.

Database application

I don’t really think there is too much to say about this, but I may be wrong. Generating the SQL statements shouldn’t be too complicated, as the user application is rather restrictive. Essentially the first three processors, filtering, ordering, and grouping correspond to WHERE, ORDER BY, and GROUP BY.
Alastair Porter
2015-03-26 23:12:57 UTC
Permalink
Hi Daniel,
Thanks for sending an email about this. Here's some feedback.
The proposal does not have much detail. If you want to submit this you
should definitely add some more detail about how you think you will do each
step. It doesn't matter if you don't stay to this plan, but we are
interested in how you think you might approach the problems.

Some specific notes:

Create an application that allows users to select music and generate
Post by Daniel Deng
various charts from a selection of filters, orderings, and groupings.
This project will consist of 3 main tasks.
1. Generate a database that combines relevant information from
MusicBrainz and AcousticBrainz.
This is an interesting feature that we're currently missing in
AcousticBrainz. We have some metadata submitted by users, including
recording id, and often (but not always) text data. We have very basic
webservice access to MusicBrainz now on the AB website (
https://github.com/metabrainz/acousticbrainz-server/blob/master/acousticbrainz/views/data.py#L68)
but it would be much better to have a way to automatically copy relevant
data from MusicBrainz when a new track comes in
Post by Daniel Deng
2. Create a database application that manages the ‘Library’ of music that
the user has selected, returns the data needed for the chart requested by
the user, and allows for updates by some administrator.
I'm not sure a library is necessary for this project. There is a lot of
value in being able to explore the data without restrictions, and I think
that forcing people to create a library from a search first is an
unnecessary step. However, searches should be reproducible, with nice URLs
Post by Daniel Deng
3. Create a front-end that interacts with the database application
A couple of questions I have are-
How will this be integrated within the current MusicBrainz ecosystem?
Will it be a desktop application like Picard or will it be web-based?
A web application
Post by Daniel Deng
How is the AcousticBrainz data currently stored? Is stored using some
DBMS or is it just a collection of JSON files currently?
This information is easily accessible on the acousticbrainz website

Database
Post by Daniel Deng
The database will consist of three entities: tracks, artists, and
releases. All of these will be associated with the corresponding MBID on
MusicBrainz. The purpose of having an artist and release entity is to make
it easier to select all the tracks associated with one of them. Each track
will contain the ‘summary’ information from the low-level data, such as
average_loudness and key_strength, and all the categorical values from the
high-level data (probabilities will be excluded).
I think the most interesting data that we have is the numerical data.
Annotating music is so subjective that people always disagree. If we can
see to which extent the computer agrees or disagrees with a label it's also
useful. I suggest you keep this.
Post by Daniel Deng
User Application
There will be three main steps to make a chart: creating the library of
tracks to explore, processing the library, and choosing the desired chart.
First, the user will create the library by entering search strings for
artists, releases, tracks, and tags. If this query is ambiguous, the user
will be prompted to select among a list of possibilities. Alternatively,
the user can just enter the MBID corresponding to what he wants. When an
artist, release, or tag is entered, all the tracks corresponding to it are
added to the library. Selecting the entire collection of tracks will also
be possible.
Selecting by text/fixed entities is easy. Have you thought about other
numerical types of searches? For example, I might want to search for tracks
that are Jazz with a probability of > 90% and have a BPM of 120-140.
Post by Daniel Deng
Next, four processors are provided for the user. They are filtering,
ordering, grouping, and attribute of interest. For filtering, the user can
have tracks with specified attributes to be removed. These attributes can
be combined with AND or OR operators. For ordering, the user can specify
some attribute to order the tracks by. For grouping, the user can specify
some attribute that splits up the remaining tracks. For attribute of
interest, the user can specify what attribute he wants to place on the
chart (this could just be a count of occurrences).
Finally, the desired chart is selected and some options could be a pie
chart, bar graph, or histogram.
Why have you chosen these three types of charts? What information do you
think they would show? (You should give some examples from the
AcousticBrainz database)
Post by Daniel Deng
Database application
I don’t really think there is too much to say about this, but I may be
wrong. Generating the SQL statements shouldn’t be too complicated, as the
user application is rather restrictive. Essentially the first three
processors, filtering, ordering, and grouping correspond to WHERE, ORDER
BY, and GROUP BY.
The database will be a big part of this project. AcousticBrainz is LARGE,
and will only grow. Currently with almost 2 million tracks there is over
100gb of data. Efficiently searching both numerical and textual data is
important. I am interested if you have any ideas about how to do fast
search. Some ideas that we've had include using elasticsearch, or indexes
on the new jsonb datatype in postgres 9.4

Thanks again for your proposal,
Alastair
Post by Daniel Deng
Hi,
I am interested in the data exploration idea for AcousticBrainz and have
drafted some ideas and questions on how I would go about it. I would like
to know if it seems like I have the right idea of what this project
actually entails.
Thanks,
Daniel Deng
Create an application that allows users to select music and generate
various charts from a selection of filters, orderings, and groupings.
This project will consist of 3 main tasks.
1. Generate a database that combines relevant information from
MusicBrainz and AcousticBrainz.
2. Create a database application that manages the ‘Library’ of music that
the user has selected, returns the data needed for the chart requested by
the user, and allows for updates by some administrator.
3. Create a front-end that interacts with the database application
A couple of questions I have are-
How will this be integrated within the current MusicBrainz ecosystem?
Will it be a desktop application like Picard or will it be web-based?
How is the AcousticBrainz data currently stored? Is stored using some
DBMS or is it just a collection of JSON files currently?
Database
The database will consist of three entities: tracks, artists, and
releases. All of these will be associated with the corresponding MBID on
MusicBrainz. The purpose of having an artist and release entity is to make
it easier to select all the tracks associated with one of them. Each track
will contain the ‘summary’ information from the low-level data, such as
average_loudness and key_strength, and all the categorical values from the
high-level data (probabilities will be excluded).
User Application
There will be three main steps to make a chart: creating the library of
tracks to explore, processing the library, and choosing the desired chart.
First, the user will create the library by entering search strings for
artists, releases, tracks, and tags. If this query is ambiguous, the user
will be prompted to select among a list of possibilities. Alternatively,
the user can just enter the MBID corresponding to what he wants. When an
artist, release, or tag is entered, all the tracks corresponding to it are
added to the library. Selecting the entire collection of tracks will also
be possible.
Next, four processors are provided for the user. They are filtering,
ordering, grouping, and attribute of interest. For filtering, the user can
have tracks with specified attributes to be removed. These attributes can
be combined with AND or OR operators. For ordering, the user can specify
some attribute to order the tracks by. For grouping, the user can specify
some attribute that splits up the remaining tracks. For attribute of
interest, the user can specify what attribute he wants to place on the
chart (this could just be a count of occurrences).
Finally, the desired chart is selected and some options could be a pie
chart, bar graph, or histogram.
Database application
I don’t really think there is too much to say about this, but I may be
wrong. Generating the SQL statements shouldn’t be too complicated, as the
user application is rather restrictive. Essentially the first three
processors, filtering, ordering, and grouping correspond to WHERE, ORDER
BY, and GROUP BY.
_______________________________________________
MusicBrainz-devel mailing list
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
Loading...