Featured Post

Apache Sqoop

Apache Sqoop transfers bulk data between Apache Hadoop and relational datastores. Sqoop is used for importing the data into HDFS, or related similar datastores such as HBase or Hive. It is also used for bulk export of data from HDFS or similar datastores such as Hive and HBase into relational databases...

Read More

In-Memory Technology and Big Data

Posted by Anahita | Posted in Business Intelligence, Data Warehouse | Posted on 14-05-2012

Tags: , , , , , , , ,


In my previous blogs I wrote about the Big Data and the related keywords and technologies such as unstructured data, Hadoop HDFS, MapReduce, etc . In this post I am looking at what “in-memory technology” brings in to help analysing the big data.

Business Intelligence is all about getting the right information to the right people in the right time, so they can make timely decisions that will help business achieve its goals such as higher service efficiency, better customer experience, and higher quality of products.

Dealing with Big Data creates many challenges, but above all of all, it is the velocity challenge. Velocity is when there is a time lag between  when the data is created and when the business can look at it and analyse it in order to correct behaviour or make  related decisions.

There are many cases that business cannot afford to wait for data to be consolidated in a data mart or data warehouse, or the aggregates to become available after the OLAP cubes are processed. There are cases that the information is required in “real time” and this is where in-memory technology becomes important.

So what is in-memory technology? In short it is when the data is stored in memory instead of the hard disk. The limitation of 4GB maximum memory is removed with the introduction of the 64 bit operating systems and considering the fact that the price of the RAM is relatively low, huge amount of data  (Terabytes  or Thousands of Gegabytes) can be stored in memory and available to be processed in real time. Having the data available in memory means faster access to the very data that is required in real time.

In summary, I have explained about the meaning of the in-memory technology and why it is now an available option for business intelligence. In fact the real benefits of in-memory technology is  the real time availability of data for Operational BI situations. This is used when huge number of transactions are required to be monitored and analysed in real time. This is very appealing to financial services for monitoring the financial transactions, call centre staff for real time fraud detection when talking to customers,  or service companies who require to act quickly as the requirements for their service capacity changes.

The in-memory technology is available by various vendors in products such as Microsoft SQL Server 2012 xVelocity and SAP HANA in SAP Business Objects BI 4.0. These solutions are varied in nature and come with several different capabilities and features, but they all make use of the new advances in hardware and software such as in-memory technology and massive parallel processing to reduce the gap between the data and the processor in order to remove bottlenecks and increase operational productivity.  Implementing the technology via these vendors promises substantially faster results in query analysis, faster decision making with real time data and finally chnaging the way the organisations get access to data and make us of massive available information!



Big EDW!

Posted by Anahita | Posted in Agile, Business Intelligence, Data Warehouse, Technology | Posted on 09-01-2012

Tags: , , , , , , , , , ,


Big Data is changing the way we need to look at Enterprise Data Warehousing. Previously I posted about big data  in Big Data – Volume, Variety and Velocity!. I also posted about the supporting projects from Apache Hadoop, such as Hbase and Hive in Big Data, Hadoop and Business Intelligence. Today I want to introduce a new concept, or better say an original idea. Big EDW!  Yes, Business Intelligence and Data Warehousing also will have to turn to Big BI and Big EDW!

So what makes the fabric of Big EDW and Big BI Analytics? The answer is the ability to analyse and make sense of Big Data, which covers not only the 20% of the structured data that organisations keep on their relational and dimensional databases, but also the vast remaining 80% unstructured data scattered in digital and web documents such as Microsoft Word, MS Excel, MS PowerPoint, MS Visio,  MS Project, as well as web data such as social media, wikis, web sites and other formats such as pictures, videos, and log files. I have posted about the meaning of unstructured data  previously  in On Unstructured Data.

Traditionally Enterprise Data Warehouse is a centralised Business Intelligence System, containing the required ETL programs to access various data sources,   transformation and load into a well designed dimensional model.  The front end BI access tools such as reporting, analytical and dashboards then is used on their own or integrated with the organisations interanet, to give the right users timely access to relevant information for analysis and decision making activities.

The Big Data does not quite  fit into this model for three main reasons, volume, variety and velocity of change and growth. Big EDW will need to break some of the traditional data warehousing concepts, but once done, it will create value that has many folds of magnitude.

Big EDW, should have the ability to be quick and agile in dealing with Big Data. It has to make it available for quick access to many new available data sources  in high volume. Enhanced design patterns or new use cases  have to emerge to make this possible. These patterns and use cases  should make use of more intelligent and faster methods of providing the relevant data when  required. This could be achieved by many methods such as  dimensional modelling, advanced mathematical/statistical models such as bootstrap and jackknife sampling to provide more accurate results for more accurate approximation for mean. median, variances, percentiles and standard deviation of big data.   Apache Hadoop  plays an essential role with projects such as  MapReduce, HDFS, HSQL (Hive SQL) and HBase. New central monitoring tools should be developed and embedded within the Big EDW to handle big data metadata such as social media sources, text analysis, sensor analysis, search ranking, etc.  Parallel Machine Learning and Data Mining, being looked at recently via projects such as Apache Mahout and Hadoop-ML combined with Complex Event Processing (CEP), amongst faster SDLC and project methodologies such as agile scrum for handling the Big EDW life cycle are also becoming standard in the realm of Big EDW.

Note that the phrase “Big EDW”  is not used anywhere else and is the naming that I thought could fit EDW growth in to a system that can also accommodate and manage  Big Data!