Featured Post

Why Agile for Business Intelligence?

I have identified three categories  related to the nature of Business Intelligence projects that makes them highly suitable for Agile approach. These categories are Skills, Change,  and Data.  Skills * BI projects are cross organisational and require both business and IT skills. * Skills required...

Read More

Big Data, Hadoop and Business Intelligence

Posted by Anahita | Posted in Business Intelligence | Posted on 17-12-2011

Tags: , , ,

1

I consider Hadoop as one of the technologies that creates a  link between Big Data Analytics and  Business Intelligence . In my previous posts I explained what Big Data means and what was the meaning of Unstructured Data. In this post I would like to introduce Hadoop, which makes it possible to gain business value from the Big Data.

Apache Hadoop is an open source project, providing software for reliable and scalable distributed computing. A simple programming model provides the ability for the distributed  processing of large data sets.  This is achieved by using a cluster of distributed processing and storage and so make it possible for Hadoop to easily scale up as required. Hadoop consists of three subprojects: Hadoop Common, Hadoop Distributed Files System (HDFS) and finally Hadoop MapReduce. Hadoop ecosystem of products also include derived technologies that could be used on their own or together to achieved the desired outcomes. Some of these related projects are Hive, Hbase, Zookeeper, etc For more details on each of the above projects, please visit http://hadoop.apache.org/

Core Hadoop is HDFS and MapReduce.

HDFS is Hadoop Distributed File System and is used as a utility in Hadoop projects to distribute data blocks to nodes in cluster which results in extremely fast computation.

MapReduce is an algorithm that makes it possible to perform parallel computing across the nodes in a cluster.

For Business Intelligence, one of the Hadoop projects, called Hive, is a data warehouse system for Hadoop compatible file systems (such as Apache HDFS or Apache HBase) and allows query, analysis and creating summary of of big data using a specific query language called Hive-QL.

Data is growing faster than ever and at the moment it doubles every year!  This will become astronomical and out of hand soon as around 80% of this data is Unstructured Data. Projects like Apache Hadoop makes it possible to analyse the Big Data and related projects such as Hive will make equivalent data warehousing for further storage and analysis of relevant data.

 

 

Comments (1)

This is the perfect way to break down this infrmotaoin.

Write a comment