Featured Post

Big EDW!

Big Data is changing the way we need to look at Enterprise Data Warehousing. Previously I posted about big data  in Big Data – Volume, Variety and Velocity!. I also posted about the supporting projects from Apache Hadoop, such as Hbase and Hive in Big Data, Hadoop and Business Intelligence. Today...

Read More

Big Data and BI

Posted by Anahita | Posted in Big Data, Business Intelligence | Posted on 05-10-2014

Tags: , , , , ,


The main and the most important characteristics of big data have been summarised i n the three Vs: Velocity, Volume and Variety. It is data in the volume that increases fast with variety of contexts, some of them not impossible to explore easily in the world of relational databases.

Business Intelligence is about providing data to business so that actionable insight can be achieved in timely manner. Considering the 3V nature of Big Data as explained above, it is crucial to ask the right questions, and find the correct way to collect, cleanse and make available for further discovery.

Apache Hadoop is an ecosystem in which distributed  commodity hardware combined with computational power of MapReduce and YARN, provide the essential ingredient for working with Big Data. MapReduce provides computational power for analysing unstructured data. It uses datasets with key-value pairs as both input and output.

Azure HDInsight is the service in the cloud that provides Hadoop framework, combining other Apache projects such as HDFS, MapReduce, Hive, Pig and Oozie.

The storage used by Azure HDInsight is Azure Blob storage.  The HDInsight clusters can be used when required and dropped after the computational tasks are completed. Blob storage can be used to keep the data after the HDInsight clusters are dropped. Blob storage has interface to HDFS file system. The Sqoop connectoors can be used ti import data from an Azure SQL database to HDFS or to export data from the HDFS to Azure SQL database.



For Business Intelligence. Microsoft Power Query Excel provides ability to import data from Azure HDInsight or any HDFS into Excel. This will provide the enhancements for data discovery and blending by enabling access to a wider range of data sources.

Further to Excel Power Query, the Microsoft Hive  ODBC Driver can be used with other Microsoft  Business Intelligence products such as Excel, SSIS and SSRS to provide an integrated solution.


Apache Tez

Posted by Anahita | Posted in Big Data | Posted on 28-12-2013

Tags: , , ,


Apache Tez, part of Stinger Initiative, is a Hadoop framework for near real-time big data processing. As opposed to MapReduce who created bulk data processing capability,  Tez provides a powerful interactive framework for running queries in Apache Hive, and Apache Pig, providing faster response times and throughput.

In  fact Apache Tez is a Hadoop data processing framework utilising DAG (Directed Acyclic Graph) for execution of complex tasks. This means Tez models data processing jobs as a data flow graph. This is similar to PIG Latin scripts, where the edges of the graph represent data flows and the vertices are operators that process data. The logic that modifies or moves the data is represented in vertices. Tez realises the logical graphs into physical at the time of execution on the cluster, applying parallelism at the vertices for scaling to the required data for processing.



Apache Sqoop

Posted by Anahita | Posted in Business Intelligence | Posted on 24-11-2013

Tags: , , , , , ,



Apache Sqoop transfers bulk data between Apache Hadoop and relational datastores. Sqoop is used for importing the data into HDFS, or related similar datastores such as HBase or Hive. It is also used for bulk export of data from HDFS or similar datastores such as Hive and HBase into relational databases such as HSQLDB.

Sqoop provides a more efficient way of data analysis.


What is Machine Learning?

Posted by Anahita | Posted in Business Intelligence, Technology | Posted on 03-02-2013

Tags: , , , ,


Computers and statisticians both can use data, but the way the process is done is completely different. Statistics is about the use of data to enable humans to conclude patterns and gain insight from the data. On the other hand statistical and mathematical models and methods can be applied to produce tools and methodologies for computers. These then are used by the machine to perform the required tasks.

When we teach the computers to give us insight about the data, we teach them to extract information from the data through algorithms in order to identify the patterns from a mass volume of noise. These algorithms, also known as Patten Recognition Algorithms, are also used to automate required tasks, enable us to train the machines to put data into certain contexts using a training set of data.

There are two main types of problems that are solved through the machine learning: classification and regression.

In future posts I will introduce you to some of the methods used in machine learning and their real life applications in Big Data.

Big Data Applications in Online Retail

Posted by Anahita | Posted in Business Analytics, Business Intelligence | Posted on 27-01-2013



This is the first of a series of posts where I simply list some applications of big data analytics in various industries and related business opportunities.

In retail, especially the online retail market, the business growth and profitability has direct connection to customer.

Marketing campaigns are only successful if they can achieve what they intended: get customer attention, sell products and keep the business relationship active.

The data that a customer produces when visiting a retail website is kept in unstructured log files. Every single move, every single click, all basket items added and removed, all saved items, all page visits, all product views are recorded. When there is a marketing campaign, an interaction with the customer such as a video, picture or promotion creates further interaction to change the normal patterns of behaviour, prompting the web site visitor to respond to the campaign. How this change of behaviour is measured is not just about the success or failure of the campaign, but also about how the individual customers responded to it. This can give insight into the effectiveness of the campaign which could be used instructively for future marketing initiatives.

Another application of big data analytics is to adjust and align the marketing activities with the sales goals by targeting the right customers and channels in the right time to convey the right message.

Big Data Analytics provides a new way to look at data that’s huge in volume, not saved in a structured format and subject to unpredictable and constant change!

Big Data Infographic

Posted by Anahita | Posted in Business Analytics, Business Intelligence | Posted on 26-01-2013



Taming Big Data | A Big Data Infographic
Via: Wikibon Big Data

Social CRM KPIs – Share of Voice

Posted by Anahita | Posted in Social CRM | Posted on 27-12-2012

Tags: ,


Knowing what is being said about a brand through the social media is essential for organisations’ strategic decision makings as part of the marketing and sales strategies. It will give organisations the ability to monitor the brand, become aware of the customer service issues, and generate new sales leads though applying the suitable marketing campaigns by identifying and targeting the customers correctly as well as helping to enhance the customer experience.

One of the Key Performance Indicator’s (KPIs) is “Share of Voice” or SoV for short. This is to look at the brand’s mention with respect to the competitors’ mentions in Social Media, such as Facebook, Twitter, message boards, blogs, and review sites. The formula is the “Total Number of Brand’s mention / Total Number of all competitors’ mentions”

Windows Azure 90-Day Free Trial

Posted by Anahita | Posted in Business Intelligence, Technology | Posted on 26-12-2012



You can get a 90 day free trial of Windows Azure. That will give you 750 HRS of Cloud Services: 750 small compute hours, 35 GB Storage with 50M transactions, 1 DU SQL Database with 1 DU of Web Business Edition, and 20 GB Data Transfers, Outbound and unlimited inbound, 10 Web Sites and Mobile Services Stays free after the 90-day  trial.

Big Data Analytics Presentation

Posted by Anahita | Posted in Business Analysis, Business Intelligence | Posted on 09-12-2012



A weekend effort to put together simple summary of Big Data, concentrating on the application in Customer driven industries such as retail.

Click on the link to download this presentation.

Big Data PowerPoint Presentation


Hadoop Explained!

Posted by Anahita | Posted in Business Analytics, Technology | Posted on 09-12-2012

Tags: ,