Search This Blog

The Big Elephant

The Big Elephant
The Big Mammoth

Friday, February 6, 2015

Analytics overview


Analytics


Have you ever think about the data that is being generated everyday over the internet in the form of tweets, blogs, email, WebPages, photos, videos, etc.

During the last decade we have experienced the data explosion. From the Dawn of 2003 till the starting of this year, human civilization has generated approximately 12 Exabyte of aggregated information.

In July 2011 alone, Facebook’s 750 million worldwide users uploaded approximately 100 terabytes of data every day to the social media platform.

This data may not be useful to you BUT to the world’s Business visionaries, leaders and marketers, this tremendous growth of data has given rise to what may prove to be the most substantial commercial opportunity since the emergence of the Internet: the ability to better understand consumers, seamlessly match “right-time” offers to their needs and optimize the management of profitable, long-term customer relationships.

Unsurprisingly, many brains are working to capitalize on this new potential of market and across the globe efforts to consolidate and aggregate information from different sources to generate meaningful data are getting focus.

This demand of real-time, rules-driven, customer-oriented data by digging into mountains of information gave birth to Analytics.

What is Analytics?

Analytics in a broad term describes a variety of statistical and analytical techniques used to develop models that shows meaningful reporting of past trends and helps in predicting future events and behavior.

The science of analytics is concerned with extracting useful properties of data typically from large data bases; these databases can contain structured information (tables, predefined data structures) or unstructured information such as pdf, documents, videos, email, images etc. As Technology is not mature enough to dig into unstructured data, efforts to provide full spectrum Analytics are already getting a blow.

Analytics can be broadly divided into two steps, Data Integration as a foundation followed by Prediction.

Data Integration: This step involve collection of data from different sources, storing that data at one data center, and then applying various techniques to generate meaningful data, such as trends, behavior and patterns.
(e.g. 70% Mobile application developers feels Apple’s IOS is too close to experiment)

Prediction: Prediction is forecasting, modeling, and simulation that explore current patterns and then guide future action.  In one line, it is about using historical pattern of data to predict future events.
(e.g. based on the previous data, Google launched Android and made it open source)

What Analytics can do?

One should not underestimate the potential of Analytics; it touches the whole spectrum of Business World. There are numerous use cases where Analytics can play a crucial role in redefining the success in Business.

Analytics can help in taking pivotal decision along the complete hierarchy of management. To start with, but not limited to

Strategic decisions set the long-term direction for an organization, a product, or an initiative that result in guidelines. These decisions are based on past trends of customer likings, finding gaps to get an idea of new product, and how market is evolving.

Operational decisions focus on a specific project or process and translate the strategy into guidelines for action, such as rules for determining an optimal price. For instance Analytics can provide the network usage per city that can further help in increasing infrastructure in particular area.

Tactical decisions repeat frequently and can occur in high volume. Examples are what price to charge a specific customer for a seat on a particular flight or a room in a specific hotel for a specific night or what Ads to serve to a particular netizen.  These decisions are based on the current trends. E.g.  A person watching football match on a video hosting site is served with a Ad related to Sports wear.

What Analytics can do? The list is endless

  • Specific Advertisement
  • Customer retention / engagement
  • Market research / customer behavior analysis
  • Website content optimization
  • Offer optimization
  • Cross-channel touch point optimization
  • Portfolio optimization
  • New Market forecasting
  • New Product / service Idea

To illustrate one case of Specific Advertisement, Data availability is now allowing advertisers, agencies and publishers to optimize ad delivery, evaluate campaign results, improve site selection and retarget ads to other sites. It’s also improving the value of media to brands by delivering their advertising to better-qualified prospects—making the ad more efficient, more valuable and providing a more compelling user experience.

What are the Challenges? 

By now, you will be thinking wow! Analytics is so great; I should get it for my business, but the billion dollar question is how to get this business specific analytics.  Ad delivery houses, ISPs, Content generators, Entrepreneurs, venture capitalist, government organizations everyone is looking for specific analytical data that suit their requirement, compliant with government policy, and that is concrete and preferably in real time. There are two options, develop in-house or buy. Developing in-house requires a very big investment and often a redundant work that some one else must be doing.

Ironically, there is no one to fulfill the market needs. A big un-tapped market is there and no BIG player.

Why so, although Analytics looks like a lucrative opportunity to tap, but it comes with tough challenges. There are numerous challenges that are hindering the growth of massive futuristic market. Some of the challenges are highlighted as

Rules-driven integration of disparate data sets: The collection, analysis and segmentation of digital data demands the aggregation and anonymization of virtually all data, challenging marketers’ fundamental ability to draw distinct insights from consumers’ cross-channel interactions.

Improved operating infrastructures: Though substantial process and data structure challenges also exist, a substantial barrier now inhibiting wider marketing data optimization resides within the marketing organization—characterized by rigid “silos” and the paucity of data-savvy marketing operations, IT and sales talent

Strong network of data-centric technology: The fastest and most efficient data aggregation, analysis and throughput solutions require a strong ecosystem of data centric technologies. Geographically redundant cloud for collecting data and multisource data mining techniques are still evolving.

Using the data / Marketing data governance: While organizations have long employed policy experts to advise on the regulatory ramifications of data utilization, many are coming to see marketing data governance—defining the “rules of the road” for assigning distinct data sources to different promotional tasks—as equally important

Expertise issues: The skill is in shortest supply and it is difficult to find individuals who understand the analytical techniques and know enough about business issues to be able to marry one to the other. i.e. correlating similar aspect of different data sets to aggregate into one knowledge.

I am a Geek, What's for me?

Now, it’s the turn of Geeks out there, “Enough jargons! How to develop it”, so, let me explain in step by step process.

First step is to collect data from different data sources (end user browser, routers, switches, load balancers, servers, etc) at one single place (Data center). Start with writing plug-in, agents that can be part or reside at same place of data source. All these plug-in, agent send periodic stats (in HTTP packets) to one central Data center (Web Service in cloud). Periodic sampling of normal stats and real-time reporting of critical data is crucial feature in agents.

Second step is to store this data in a database; obviously conventional database is not a choice with millions of columns and billions of rows. Some choices, store structured data in No-SQL such as Cassandra and un-structured data (file, picture, doc, etc) in HDFS.

Third step is very complex and backbone of analytics, it involves digging into database, correlating different data collected from different sources based on specific rule set. {Lets take a simple example, first data source (a video player plug-in) reports QoE of the video that it plays along with the IPa of the user, second data source (Content Delivery Network [CDN]) reports about the content (Live football match) of the video watched by IPa (assuming only CDN knows about the video content), so correlating data from these two data sources, one can know about the quality of a “Live football Match” User IPis receiving}. After correlation, this complex process involves aggregation of data over multiple user, different time frames, and different locations to generate a pattern or trend.
Several Techniques and tools can be used to perform this correlation and aggregation. Some are,
·         Map reduce
·         Indexing
·         ETL (e.g. Kettle)


Fourth Step is to store the aggregated information into structured data bases (RDBMS) with predefined tables and columns. Multiple procedures (PL/SQL) can be written in RDBMS to support complex queries. Providing support for “Ad-hoc query” is imperative.

Final Step is to provide a GUI portal that will show trends and patterns in the form of graphs by querying into RDBMS. Generating graphs on the fly, based on Ad-hoc query is what everyone wants. Some tools that get attention here are:
·         Tableau
·         Google charts


No matter what techniques or tools you are going to use, there are some key-points that should be remembered while developing any Analytics solution.

·         Appropriate Sampling of Data
·         Anonymization
·         Reporting critical events
·         Central storage (very large data base)
·         Correlation of different data metrics
·         Aggregation from different source
·         Data mining techniques(ETL, Map reduce, indexing)
·         Report generation


As you have noted all the above steps are part of Data Integration (step 1 of Analytics) only, as Prediction is very subjective let’s leave the it to the management.

  

No comments:

Post a Comment