[mb-devel] AcousticBrainz Project for GSoC 2015

Discussion:

Mathew Thomas

2015-03-22 06:37:16 UTC

Hey all,

I would like to discuss ideas with anyone who is mainly focused on the
AcousticBrainz project. I'm interested in helping the development of an app
to enable users to create sample datasets.
My idea is that by providing a properly varied dataset for each respective
genre and/or mood, the generated model can be made far more accurate. I
would very much like to discuss my ideas for selecting an appropriate data
set and possible additions to the machine learning algorithms used to
detect high level data. Please reply if AcousticBrainz is the project
you're working on.

Cai Kang

2015-03-23 03:46:08 UTC

Permalink

Hi all,

I'm glad to participate in the discussion about AcousticBrainz Project. I
can see it will have a good future. In the big-data time, data are
something high-value only after they have been processed systematically.
Compared with image datasets, music datasets are relatively scarce. So itâs
a good opportunity for AcousticBrainz. Now the necessary demand of
AcousticBrainzâs project is to build a tagging tool. In my opinion, the
tool should at least provide two functions:

One allows users upload basic information of music including attributes in
the form of document directly. Then the tool maps the music to a certain
ID, if no ID has been provided by users. After that, it can change the
information of music in to the input form that existing tools can
understand.

The other one allows users to do online\offline tagging task. The tool will
provide a tagging system: Users can do tagging task while they listen to
the music with a certain ID. The tagging attributes may include
âgender\genre\moodâ and so on. If in offline mode, the data would be
packaged and uploaded to server when tagging task finishes. If in online
mode, the data would be sent to server as soon as a song has been tagged.

The above is my preliminary idea. I look forward to a further discussion.

Best regards,

Kang Cai

Alastair Porter

2015-03-23 09:50:49 UTC

Permalink

Hi Matthew,
Thanks for writing.
We have had a lot of interest in building a dataset editor for
AcousticBrainz. You're welcome to continue submitting a proposal for this
task, but you might have a better chance of being accepted if you think of
a different task for AcousticBrainz.

Here are some other ideas we've had:

- An interactive system to explore the data that we already have in AB:
For example, what are all of the songs that we say are in a certain Key.
Order these by tempo and then group them by mood
- A search system (which could be part of the above task) that lets you
search for tracks by their metadata or by extracted features. This could
use an existing search technology (e.g. solr), or something custom-written
for the task. A similar task would be to be able to place songs in an
n-dimensional similarity space to explore songs that are acoustically
similar.
- An investigation of the accuracy of AcousticBrainz compared to other
music databases. For example, Musicbrainz has many tags which represent
genres. This information is also available from services like last.fm.
Lower-level information such as key and bpm is available from services such
as the Echo Nest.

If you're still interested in working on the dataset creator we want to
make sure you understand the problem well before you send a proposal,
especially in the context of machine learning tasks. Some questions you
might want to consider:
- What is the functionality of the dataset creator? What are the main
models/concepts that will be present?
- What technologies are you thinking of using?
- Who gets to make a decision about why an item goes in a dataset? How do
you know you can trust them?
- What happens if someone intentionally adds bad data to a dataset? How do
you make sure that this doesn't happen, and if it does, you can reverse it
- Do you want to evaluate the datasets somehow? How will you allow people
to run an evaluation of their dataset to get an accuracy rating?
- How will you check if a high accuracy in a dataset is representative of
the accuracy of the model over the entire AcousticBrainz dataset?
- If you have many datasets that represent the same training task, how will
you choose which one is best?

Regards,
Alastair

Post by Mathew Thomas
Hey all,
I would like to discuss ideas with anyone who is mainly focused on the
AcousticBrainz project. I'm interested in helping the development of an app
to enable users to create sample datasets.
My idea is that by providing a properly varied dataset for each respective
genre and/or mood, the generated model can be made far more accurate. I
would very much like to discuss my ideas for selecting an appropriate data
set and possible additions to the machine learning algorithms used to
detect high level data. Please reply if AcousticBrainz is the project
you're working on.
_______________________________________________
MusicBrainz-devel mailing list
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel