In this paper we describe a novel approach to applying text-based information retrieval techniques to music collections. We represent tracks with a joint vocabulary consisting of both conventional words, drawn from social tags, and audio muswords , representing characteristics of automatically-identified regions of interest within the signal. We build vector space and latent aspect models indexing words and muswords for a collection of tracks, and show experimentally that retrieval with these models is extremely well-behaved. We find in particular that retrieval performance remains good for tracks by artists unseen by our models in training, and even if tags for their tracks are extremely sparse.
Chang Bae MoonJong Yeol LeeDong‐Seong KimByeong Man Kim
Daniel GrzywczakGrzegorz Gwardys
Ioannis KarydisA. NanopoulosApostolos N. PapadopoulosYannis Manolopoulos