Big Data, Hadoop and Business Intelligence
Posted by Anahita | Posted in Business Intelligence | Posted on 17-12-2011
Tags: Apache Hadoop, Big Data, Business Intelligence, Unstructured Data
1
I consider Hadoop as one of the technologies that creates a link between Big Data Analytics and Business Intelligence . In my previous posts I explained what Big Data means and what was the meaning of Unstructured Data. In this post I would like to introduce Hadoop, which makes it possible to gain business value from the Big Data.
Apache Hadoop is an open source project, providing software for reliable and scalable distributed computing. A simple programming model provides the ability for the distributed processing of large data sets. This is achieved by using a cluster of distributed processing and storage and so make it possible for Hadoop to easily scale up as required. Hadoop consists of three subprojects: Hadoop Common, Hadoop Distributed Files System (HDFS) and finally Hadoop MapReduce. Hadoop ecosystem of products also include derived technologies that could be used on their own or together to achieved the desired outcomes. Some of these related projects are Hive, Hbase, Zookeeper, etc For more details on each of the above projects, please visit http://hadoop.apache.org/
Core Hadoop is HDFS and MapReduce.
HDFS is Hadoop Distributed File System and is used as a utility in Hadoop projects to distribute data blocks to nodes in cluster which results in extremely fast computation.
MapReduce is an algorithm that makes it possible to perform parallel computing across the nodes in a cluster.
For Business Intelligence, one of the Hadoop projects, called Hive, is a data warehouse system for Hadoop compatible file systems (such as Apache HDFS or Apache HBase) and allows query, analysis and creating summary of of big data using a specific query language called Hive-QL.
Data is growing faster than ever and at the moment it doubles every year! This will become astronomical and out of hand soon as around 80% of this data is Unstructured Data. Projects like Apache Hadoop makes it possible to analyse the Big Data and related projects such as Hive will make equivalent data warehousing for further storage and analysis of relevant data.


This is the perfect way to break down this infrmotaoin.