The amount of online photos and videos is now at the scale of tens of billions. To organize, index, and retrieve these large-scale rich-media data, a system must employ scalable data management and mining algorithms. The research community needs to consider solving large scale problems rather than solving problems with small datasets that do not reflect real life scenarios. This tutorial introduces key challenges in large-scale rich-media data mining, and presents parallel algorithms for tackling such challenges. We present our parallel implementations of Spectral Clustering (PSC), FP-Growth (PFP), Latent Dirichlet Allocation (PLDA), and Support Vector Machines (PSVM).
Masaru KitsuregawaTakahiko ShintaniMasahisa TamuraIko Pramudiono