Large-scale data-mining workflows are increasingly able to predict successfully new chemicals that possess a targeted functionality. The success of such materials discovery approaches is nonetheless contingent upon having the right data source to mine, adequate supercomputing facilities and machine-learning workflows to calculate or sample a large range of materials, and algorithms that suitably encode structure-function relationships as datamining workflows which progressively short list data toward the prediction of a lead material for experimental validation. This talk shows how to meet these data-science requirements via 'chemistry-aware' natural language processing, image recognition and machine learning developments using case studies to showcase their successful application to data-driven materials discovery.