Featured Post

Big Data and BI

The main and the most important characteristics of big data have been summarised i n the three Vs: Velocity, Volume and Variety. It is data in the volume that increases fast with variety of contexts, some of them not impossible to explore easily in the world of relational databases. Business Intelligence...

Read More

Big Data – Examples of Unstructured Data

Posted by Anahita | Posted in Business Analysis, Business Intelligence | Posted on 21-01-2012

Tags: ,


Big Data has become a reality that cannot be ignored. In one of my previous posts, I explained the reason for the adjective big that sits before data to create big data. I mentioned that big is not just refereeing to volume, but also to variety and velocity of growth.  Big data not only is big in size, but is fast in getting bigger and it covers a variety of data sources that exceed the boundary of the existing relational systems such as CRM  and ERP.

I thought may be a post to give some examples of unstructured  big data may interest many readers, so below I have bullet pointed some and  I may update this post  in future when I come across more examples.

  • Detailed machine generated data, such as equipment logs, RFID tags
  • Sensor generated data such as in manufacturing, metering, condition monitors
  • Web related data, such as visitors, hits, keywords, times, etc
  • Social Media data such as twitter, facebook comments, feedbacks, etc
  • books, journals and text base documents
  • Scanned records
  • Audio files
  • Video files
  • GIS and map related data files
  • Body of  email messages
  • Web page contents such as static pages or blogs and wikis
  • Image data such as spatial and auto cad images

Big Data Solution: IBM InfoSphere BigInsights

Posted by Anahita | Posted in Technology | Posted on 20-01-2012

Tags: , , , ,


Big Data again, as this subject fascinates me. Looking into tools and technologies, I already posted about open source Apache Hadoop projects, HDFS and MapReduce.


IBM offers two editions of its InfoSphere BigInsights which are compatible with Apache Hadoop ecosystem for handling the big data.

The Basic Edition of InfoSphere BigInsight is a free download edition that includes a fully integrated and compatible version of Apache Hadoop  and related components. This comes with a web based management console and the ability to integrate with IBM InfoSphere Warehouse, IBM Smart Analytics System and finally DB2 for Windows, Unix and Linux. Complete with Jaql, a SQL like query language for both structured and non-traditional data types.

The Enterprise Edition supports structured, semi-structured and non-structured data, and massive data scale out, while running on commonly available hardware. An enterprise class management including job management  and security features including Active Director/LDAP authentication.

More details on editions and pricing is available from the IBM website.






Product Management by Reality – Agile BI

Posted by Anahita | Posted in Agile, Business Intelligence | Posted on 17-01-2012

Tags: ,


Agile focuses on delivering value. Business Intelligence is also about providing means to deliver value. But how value is defined?  Value may be different for internal  organisational units within an organisation, different roles within the same organisational unit,  or even different times. So to deliver value via business intelligence is not a static process. It involves change and this is where Agile BI comes into the picture.

To my opinion first of all Business Intelligence should not be considered as a project. I explain: Business Intelligence is about making sense of organisational data via accessing a well trusted delivery model that suits best for the domain and type of the user with  the supporting underlying infrastructure. Now it is simple: Data changes, people change, processes change, businesses merge or separate, systems merge or separate, teams combine, groups divide, processes streamline to accommodate all these changes, the new data enters the cycle, and some data seizes to be important of of having any value as the result.

This is why Business Intelligence and Management Information is no longer a programme, it is a product group and it is in fact an evolving product group with subgroups for handling many layers that Management Information covers. This is a portfolio of products that requires project or programme management for continuous improvements!

Agile can handle this complexity, because agile concentrates on delivery of  higher value on  repeated short time frames and keep reviewing the product items by adding what is recognised as value for the whole organisation.  In short Agile BI  is  Management of MI Requirements by Reality!





Big EDW!

Posted by Anahita | Posted in Agile, Business Intelligence, Data Warehouse, Technology | Posted on 09-01-2012

Tags: , , , , , , , , , ,


Big Data is changing the way we need to look at Enterprise Data Warehousing. Previously I posted about big data  in Big Data – Volume, Variety and Velocity!. I also posted about the supporting projects from Apache Hadoop, such as Hbase and Hive in Big Data, Hadoop and Business Intelligence. Today I want to introduce a new concept, or better say an original idea. Big EDW!  Yes, Business Intelligence and Data Warehousing also will have to turn to Big BI and Big EDW!

So what makes the fabric of Big EDW and Big BI Analytics? The answer is the ability to analyse and make sense of Big Data, which covers not only the 20% of the structured data that organisations keep on their relational and dimensional databases, but also the vast remaining 80% unstructured data scattered in digital and web documents such as Microsoft Word, MS Excel, MS PowerPoint, MS Visio,  MS Project, as well as web data such as social media, wikis, web sites and other formats such as pictures, videos, and log files. I have posted about the meaning of unstructured data  previously  in On Unstructured Data.

Traditionally Enterprise Data Warehouse is a centralised Business Intelligence System, containing the required ETL programs to access various data sources,   transformation and load into a well designed dimensional model.  The front end BI access tools such as reporting, analytical and dashboards then is used on their own or integrated with the organisations interanet, to give the right users timely access to relevant information for analysis and decision making activities.

The Big Data does not quite  fit into this model for three main reasons, volume, variety and velocity of change and growth. Big EDW will need to break some of the traditional data warehousing concepts, but once done, it will create value that has many folds of magnitude.

Big EDW, should have the ability to be quick and agile in dealing with Big Data. It has to make it available for quick access to many new available data sources  in high volume. Enhanced design patterns or new use cases  have to emerge to make this possible. These patterns and use cases  should make use of more intelligent and faster methods of providing the relevant data when  required. This could be achieved by many methods such as  dimensional modelling, advanced mathematical/statistical models such as bootstrap and jackknife sampling to provide more accurate results for more accurate approximation for mean. median, variances, percentiles and standard deviation of big data.   Apache Hadoop  plays an essential role with projects such as  MapReduce, HDFS, HSQL (Hive SQL) and HBase. New central monitoring tools should be developed and embedded within the Big EDW to handle big data metadata such as social media sources, text analysis, sensor analysis, search ranking, etc.  Parallel Machine Learning and Data Mining, being looked at recently via projects such as Apache Mahout and Hadoop-ML combined with Complex Event Processing (CEP), amongst faster SDLC and project methodologies such as agile scrum for handling the Big EDW life cycle are also becoming standard in the realm of Big EDW.

Note that the phrase “Big EDW”  is not used anywhere else and is the naming that I thought could fit EDW growth in to a system that can also accommodate and manage  Big Data!








Agile Ball Point Game

Posted by Anahita | Posted in Agile | Posted on 03-01-2012



Here is a game that will help teams practice scrum before the first real sprint!  This game also highlights the values of Agile Manifesto in a simple practical and quick manner: collaboration, feedback from previous learning and a working solution!

Enjoy and play at your next Project Chartering Session!

ERP, BI and UML 2.0

Posted by Anahita | Posted in Business Analysis, Business Intelligence | Posted on 03-01-2012

Tags: , ,


Enterprise Resource Management (ERP) systems are  organisational platforms for coordination of organisational processes and supporting data in order to provide cohesive and timely services by providing integration of HR, Finance, Manufacturing, Supply Chain and Customer Services as core activities. These could be extended further to contain other entities such as Project Management, Asset and Maintenance Management, etc. ERP systems normally are supported by a Relational Database Management System (RDBMS).

Business Intelligence Systems are created to provide timely decision making power for organisations. More and more organisations use Business Intelligence to gain ability to access the correct information in the format that is easy to understand and analyse, or even in the form of applications providing answers to specific business queries.

As both ERP and BI systems are complex, supporting vast number of business processes and related information, to model and communicate the relation between business processes in ERP systems and related  Business Intelligence Objects, an extension to OMG (Object Management Group) UML 2.o Activity Diagram  for Business Intelligence and Data Warehouse is suggested in the white paper “Extending UML 2 Activity Diagrams with Business Intelligence Objects” by Veronika Stefanov, Beate List, Birgit Korher. In this paper a BI profile is introduced by defining new object stereotypes “DataRepository”, “DataObject” and “PresentationObject”. “DataRepository” covers “OperationalDataStore”, “DataWarehouse” and “Datamart”. “DataObject” covers “Entity” and “Fact”, and finally “PresentationObject” covers “Report” and “InteactiveAnalysis”. To view the extended meta model see Fig 2.0 in the above paper.

With the corresponding notation for the extended stereotype, the profile is a powerful tool for modelling the BI Objects in UML 2.0 activity diagram.  As in the above diagram, the diamonds are notation for “Fact” Objects, representing multidimensional data models, showing “Customer” and “Policy Transactions”, as well as “insurance company” Data Warehouse.