Hi Matthew,
Thanks for writing.
We have had a lot of interest in building a dataset editor for
AcousticBrainz. You're welcome to continue submitting a proposal for this
task, but you might have a better chance of being accepted if you think of
a different task for AcousticBrainz.
Here are some other ideas we've had:
- An interactive system to explore the data that we already have in AB:
For example, what are all of the songs that we say are in a certain Key.
Order these by tempo and then group them by mood
- A search system (which could be part of the above task) that lets you
search for tracks by their metadata or by extracted features. This could
use an existing search technology (e.g. solr), or something custom-written
for the task. A similar task would be to be able to place songs in an
n-dimensional similarity space to explore songs that are acoustically
similar.
- An investigation of the accuracy of AcousticBrainz compared to other
music databases. For example, Musicbrainz has many tags which represent
genres. This information is also available from services like last.fm.
Lower-level information such as key and bpm is available from services such
as the Echo Nest.
If you're still interested in working on the dataset creator we want to
make sure you understand the problem well before you send a proposal,
especially in the context of machine learning tasks. Some questions you
might want to consider:
- What is the functionality of the dataset creator? What are the main
models/concepts that will be present?
- What technologies are you thinking of using?
- Who gets to make a decision about why an item goes in a dataset? How do
you know you can trust them?
- What happens if someone intentionally adds bad data to a dataset? How do
you make sure that this doesn't happen, and if it does, you can reverse it
- Do you want to evaluate the datasets somehow? How will you allow people
to run an evaluation of their dataset to get an accuracy rating?
- How will you check if a high accuracy in a dataset is representative of
the accuracy of the model over the entire AcousticBrainz dataset?
- If you have many datasets that represent the same training task, how will
you choose which one is best?
Regards,
Alastair
Post by Mathew ThomasHey all,
I would like to discuss ideas with anyone who is mainly focused on the
AcousticBrainz project. I'm interested in helping the development of an app
to enable users to create sample datasets.
My idea is that by providing a properly varied dataset for each respective
genre and/or mood, the generated model can be made far more accurate. I
would very much like to discuss my ideas for selecting an appropriate data
set and possible additions to the machine learning algorithms used to
detect high level data. Please reply if AcousticBrainz is the project
you're working on.
_______________________________________________
MusicBrainz-devel mailing list
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel