In big data world, Hadoop Distributed File System (HDFS) is one of the famous file system to store huge data. HDFS will take care about managing and maintaining the data in distributed way. Based on research we did to discuss that how the real time streaming data can be processed and stored into Mongo DB and Hive. Big data analytics can be performed on data stored on Hadoop distributed file system using Apache Hive, Tez and Apache Presto. Hive is an ecosystem which is on top of Hadoop (MapReduce), and provides higher-level language to use Hadoop's core component MapReduce to process the data. The key benefits of this approach are it can able to store and process the large amount of data. It can also handle the millions of user requests concurrently. It can provide the scalability for the system is enhanced by adding new nodes. Integrating the Visualization tools with Big Data applications will give the big picture to the users to view the insights of the Big data. It can provide the analytic reports for giving the big picture about the system.
Gireesh Babu C N1, Manjunath T N2