JOURNAL ARTICLE

Computer Vision for Music Identification

Abstract

We describe how certain tasks in the audio domain can be effectively addressed using computer vision approaches. This paper focuses on the problem of music identification, where the goal is to reliably identify a song given a few seconds of noisy audio. Our approach treats the spectrogram of each music clip as a 2D image and transforms music identification into a corrupted sub-image retrieval problem. By employing pairwise boosting on a large set of Viola-Jones features, our system learns compact, discriminative, local descriptors that are amenable to efficient indexing. During the query phase, we retrieve the set of song snippets that locally match the noisy sample and employ geometric verification in conjunction with an EM-based "occlusion" model to identify the song that is most consistent with the observed signal. We have implemented our algorithm in a practical system that can quickly and accurately recognize music from short audio samples in the presence of distortions such as poor recording quality and significant ambient noise. Our experiments demonstrate that this approach significantly outperforms the current state-of-the-art in content-based music identification.

Keywords:
Computer science Discriminative model Spectrogram Artificial intelligence Boosting (machine learning) Search engine indexing Music information retrieval Speech recognition Pattern recognition (psychology) Pairwise comparison Identification (biology) Noise (video) Set (abstract data type) Computer vision Image (mathematics)

Metrics

147
Cited By
9.71
FWCI (Field Weighted Citation Impact)
20
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Computer Vision and digitised music sources

Fornés, Alicia

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

Computer Vision and digitised music sources

Fornés, Alicia

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

Computer vision for computer-aided microfossil identification

Adam P. Harrison

Journal:   University of Alberta Library Year: 2010
© 2026 ScienceGate Book Chapters — All rights reserved.