Featured Post

Big Data, Hadoop and Business Intelligence

I consider Hadoop as one of the technologies that creates a  link between Big Data Analytics and  Business Intelligence . In my previous posts I explained what Big Data means and what was the meaning of Unstructured Data. In this post I would like to introduce Hadoop, which makes it possible to gain...

Read More

Big Data and BI

Posted by Anahita | Posted in Big Data, Business Intelligence | Posted on 05-10-2014

Tags: , , , , ,

0

The main and the most important characteristics of big data have been summarised i n the three Vs: Velocity, Volume and Variety. It is data in the volume that increases fast with variety of contexts, some of them not impossible to explore easily in the world of relational databases.

Business Intelligence is about providing data to business so that actionable insight can be achieved in timely manner. Considering the 3V nature of Big Data as explained above, it is crucial to ask the right questions, and find the correct way to collect, cleanse and make available for further discovery.

Apache Hadoop is an ecosystem in which distributed  commodity hardware combined with computational power of MapReduce and YARN, provide the essential ingredient for working with Big Data. MapReduce provides computational power for analysing unstructured data. It uses datasets with key-value pairs as both input and output.

Azure HDInsight is the service in the cloud that provides Hadoop framework, combining other Apache projects such as HDFS, MapReduce, Hive, Pig and Oozie.

The storage used by Azure HDInsight is Azure Blob storage.  The HDInsight clusters can be used when required and dropped after the computational tasks are completed. Blob storage can be used to keep the data after the HDInsight clusters are dropped. Blob storage has interface to HDFS file system. The Sqoop connectoors can be used ti import data from an Azure SQL database to HDFS or to export data from the HDFS to Azure SQL database.

2318.clip_image001_6DC25ED8

 

For Business Intelligence. Microsoft Power Query Excel provides ability to import data from Azure HDInsight or any HDFS into Excel. This will provide the enhancements for data discovery and blending by enabling access to a wider range of data sources.

Further to Excel Power Query, the Microsoft Hive  ODBC Driver can be used with other Microsoft  Business Intelligence products such as Excel, SSIS and SSRS to provide an integrated solution.

 

Apache Sqoop

Posted by Anahita | Posted in Business Intelligence | Posted on 24-11-2013

Tags: , , , , , ,

0

sqoop-logo

Apache Sqoop transfers bulk data between Apache Hadoop and relational datastores. Sqoop is used for importing the data into HDFS, or related similar datastores such as HBase or Hive. It is also used for bulk export of data from HDFS or similar datastores such as Hive and HBase into relational databases such as HSQLDB.

Sqoop provides a more efficient way of data analysis.

apache-hadoop-sqoop1

Big Data Solution: IBM InfoSphere BigInsights

Posted by Anahita | Posted in Technology | Posted on 20-01-2012

Tags: , , , ,

0

Big Data again, as this subject fascinates me. Looking into tools and technologies, I already posted about open source Apache Hadoop projects, HDFS and MapReduce.

 

IBM offers two editions of its InfoSphere BigInsights which are compatible with Apache Hadoop ecosystem for handling the big data.

The Basic Edition of InfoSphere BigInsight is a free download edition that includes a fully integrated and compatible version of Apache Hadoop  and related components. This comes with a web based management console and the ability to integrate with IBM InfoSphere Warehouse, IBM Smart Analytics System and finally DB2 for Windows, Unix and Linux. Complete with Jaql, a SQL like query language for both structured and non-traditional data types.

The Enterprise Edition supports structured, semi-structured and non-structured data, and massive data scale out, while running on commonly available hardware. An enterprise class management including job management  and security features including Active Director/LDAP authentication.

More details on editions and pricing is available from the IBM website.

 

 

 

 

 

Big EDW!

Posted by Anahita | Posted in Agile, Business Intelligence, Data Warehouse, Technology | Posted on 09-01-2012

Tags: , , , , , , , , , ,

0

Big Data is changing the way we need to look at Enterprise Data Warehousing. Previously I posted about big data  in Big Data – Volume, Variety and Velocity!. I also posted about the supporting projects from Apache Hadoop, such as Hbase and Hive in Big Data, Hadoop and Business Intelligence. Today I want to introduce a new concept, or better say an original idea. Big EDW!  Yes, Business Intelligence and Data Warehousing also will have to turn to Big BI and Big EDW!

So what makes the fabric of Big EDW and Big BI Analytics? The answer is the ability to analyse and make sense of Big Data, which covers not only the 20% of the structured data that organisations keep on their relational and dimensional databases, but also the vast remaining 80% unstructured data scattered in digital and web documents such as Microsoft Word, MS Excel, MS PowerPoint, MS Visio,  MS Project, as well as web data such as social media, wikis, web sites and other formats such as pictures, videos, and log files. I have posted about the meaning of unstructured data  previously  in On Unstructured Data.

Traditionally Enterprise Data Warehouse is a centralised Business Intelligence System, containing the required ETL programs to access various data sources,   transformation and load into a well designed dimensional model.  The front end BI access tools such as reporting, analytical and dashboards then is used on their own or integrated with the organisations interanet, to give the right users timely access to relevant information for analysis and decision making activities.

The Big Data does not quite  fit into this model for three main reasons, volume, variety and velocity of change and growth. Big EDW will need to break some of the traditional data warehousing concepts, but once done, it will create value that has many folds of magnitude.

Big EDW, should have the ability to be quick and agile in dealing with Big Data. It has to make it available for quick access to many new available data sources  in high volume. Enhanced design patterns or new use cases  have to emerge to make this possible. These patterns and use cases  should make use of more intelligent and faster methods of providing the relevant data when  required. This could be achieved by many methods such as  dimensional modelling, advanced mathematical/statistical models such as bootstrap and jackknife sampling to provide more accurate results for more accurate approximation for mean. median, variances, percentiles and standard deviation of big data.   Apache Hadoop  plays an essential role with projects such as  MapReduce, HDFS, HSQL (Hive SQL) and HBase. New central monitoring tools should be developed and embedded within the Big EDW to handle big data metadata such as social media sources, text analysis, sensor analysis, search ranking, etc.  Parallel Machine Learning and Data Mining, being looked at recently via projects such as Apache Mahout and Hadoop-ML combined with Complex Event Processing (CEP), amongst faster SDLC and project methodologies such as agile scrum for handling the Big EDW life cycle are also becoming standard in the realm of Big EDW.

Note that the phrase “Big EDW”  is not used anywhere else and is the naming that I thought could fit EDW growth in to a system that can also accommodate and manage  Big Data!