Courtesy of ebizq
Building BI with
structured (relational and legacy) and unstructured data (Hadoop,
Companies are experimenting more and more with petabytes of
unstructured data like social media entries, customer support
emails, complaints, suggestion boxes, etc. These are increasingly
using Hadoop, MapReduce and NoQSL distributed databases like
MongoDB, Casssadra, Riak, etc.
On the other hand you have relational databases holding lots and
lots of structured data like transactional data in their humungous
data warehouses! Data warehousing efforts in many organizations are
maturing rapidly and with storage getting less and less expensive
every year, more data can be stored and used.
How do you build unified Business Intelligence that combines all
of the older transactional, structured data with unstructured but
never the less, very valuable data?
Pentaho seems to have come up with some very slick answers!
They have tools that simply pick up Hadoop data and merge them
with other structured and unstructured data from relational
databases, data warehouses and new distributed databases that store
unstructured databases like MongoDB, and Cassandra!
The nice thing about this approach and toolset is that you don't
need to muck around with highly technical stuff like MapReduce to
be able to do Business Intelligence.
Merging legacy transactional data with unstructured data may
help organizations answer questions like "What are our top regions
for sales of widget X and what do people say about our product or
service from that region?" or "How do negative perceptions of
product X correlate with the changes in sales of product X in
Being able to merge these two kinds of data quickly is very
crucial and is a HUGE painpoint in Business Intelligence these
I am sure once you have ways of combining structured and
unstructured data all kinds of fancy Business Intelligence can be
Pentaho has done a
terrfic job of leading the market in these kinds of efforts! Very
slick and useful!
Now that companies like Google, Facebook, Yahoo and others have
made tremendous contributions in storing and retrieving
unstructured data in an efficient and constantly available manner
using distributed computing and the cloud with Hadoop and
mapReduce, they are ready to be used commercially in other
Much needed gap that Pentaho is bridging!