Introduction to BigData

Filed Under: Big Data

We are going to deliver a series of Tutorials on the following concepts one by one:

  1. BigData
  2. Hadoop
  3. Hadoop Ecosystem
  4. Cloud
  5. Amazon Web Services
  6. Google Cloud Platform
  7. Microsoft Azure
  8. BigData with Cloud
  9. Spring Cloud – Cloud Foundry
  10. Spring Hadoop Module
  11. Spark With Hadoop

First we will start with BigData Basics, then move to Hadoop to Cloud then finally we will discuss about “How to use BigData Solutions with Cloud Platforms”. We will discuss different BigData and Cloud Platforms Solutions available in the current market like Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure, IBM Bluemix, Pivotal Cloud Foundry, Yahoo Cloud Platform etc.

Finally we will discuss how to develop applications using Spring Cloud and Spring Hadoop Modules. We feel that these two are really Big subjects: BigData and Cloud so it may take more time to discuss all these concepts in-detail with Real-time examples. Please bare with us.

In this series, first we are going to discuss about BigData Basics in this post.

Post’s Table Of Contents

  • Introduction
  • BigData Introduction
  • What is BigData
  • BigData Characteristics
  • Why Data is Important
  • Why Big Data is so Important
  • BigData: Data Formats
  • BigData Advantages
  • BigData Solutions
  • BigData Use Cases

BigData Introduction

Now We are living in Big Data Era.

Few years ago, Systems or Organizations or Applications were using all Structured Data only ( Structured Data means In the form of Rows and Columns). It was very easy to use Relational Data Bases (RDBMS) and old Tools to store, manage, process and report this Data.

However recently, Nature of Data is changed. And Systems or Organizations or Applications are generating huge amount of Data in variety of formats at very fast rate.

That means Data is not simple Structured Data(Not in the form of simple Rows and Columns). It does not have any proper format, just RawData without any format. It is “very difficult or not possible” to use Old Technologies, Traditional Relational Databases and Tools to store, manage, process and report this Data. Traditional DataBases cannot Store, Process and Analysis this kind of Data.

Then how to solve this problem? Here BigData Solutions come into picture.

Big Data Solutions solve all these problems very easily.

Let us start with understanding What is BigData and How important it is in our life.

What is BigData

We don’t have a straightforward definition to BigData. However, we will try to answer this question in different ways.

In Simple Words, Big Data is a technique to solve data problems that are not solvable using Traditional DataBases and Tools.

In other way, BigData means not just huge amount of Data. BigData means huge amount of data generating at very fast rate in different formats.

Big Data is a Technique to “Store, Process, Manage, Analysis and Report” a huge amount of variety data, at the required speed, and within the required time to allow Real-time Analysis and Reaction.

BigData is Data with has the following three characteristics:

  • Extremely Large Volumes of Data
  • Extremely High Velocity of Data
  • Extremely Wide Variety of Data

BigData Characteristics

The following three are known as “BigData Characteristics”.

  • Volume
  • Velocity
  • Variety
  1. Volume:
  2. Volume means “How much Data is generated”. Now-a-days, Organizations or Human Beings or Systems are generating or getting very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa Byte(EB) and more.

    bigdata-3vs-volume

  3. Velocity:
  4. Velocity means “How fast produce Data”. Now-a-days, Organizations or Human Beings or Systems are generating huge amounts of Data at very fast rate.

    bigdata-3vs-velocity

  5. Variety:
  6. Variety means “Different forms of Data”. Now-a-days, Organizations or Human Beings or Systems are generating very huge amount of data at very fast rate in different formats. We will discuss in details about different formats of Data soon.

    bigdata-3vs-variety

BigData refers to 3V (VVV) Paradigm:

bigdata-3vs

Three “Vs” Paradigm (Volume, Velocity, Variety) of Big Data was defined by “Doug Laney” in 2001.

If our Organization’s Data is in this 3Vs Paradigm, that means we are in BigData Problems. So we should use some BigData Solutions to solve our problems.

These 3Vs Paradigm is not enough to get better value from our BigData. There is another V (4th V), which is most important for every BigData problem.

4th V : Veracity

Veracity means “The Quality or Correctness or Accuracy of Captured Data”. Out of 4Vs, it is most important V for any BigData Solutions. Because without Correct Information or Data, there is no use of storing large amount of data at fast rate and different formats. That data should give correct business value.

bigdata-4thv-veracity

So this 4th V answers the following questions:

How accurate is that data in predicting business value?
Do the results of a big data analysis actually make sense?

BigData 4Vs In Simple Terminology:
V(Volume) : The Amount of Data
V(Variety) : The number of Type of Data
V(Velocity) : The Speed of Data Processing
V(Veracity) : The Correctness of Data

Why Data is Important

We are living in Data Era or Information Era. Data is most important factor for all Organizations for the following reasons or benefits:

  • Data is useful in Decision Making
  • To know Customer Preferences so that Organizations can improve their Business
  • Getting the Right Information for Business
  • By analyzing Data, We can optimize our systems.
  • More Data, More Analysis, More Results, More Profits.
  • Data is effective in improving Business Value
  • Data Analysis provides Customer Likes and Dislikes information
  • And More.

Why BigData is so Important

Now-a-Days, Big data is very very important for Organizations or Companies form Medium-Size to Large-Size, because it enables them to gather, store, manage, and manipulate “Extremely Large Amounts Of Data, Extremely High Velocity of Data and Extremely Wide Variety of Data”:

  • At the right speed
  • At the right time
  • To get the required Business Value

By following this Big Data 4Vs Paradigm, we will get lot of benefits as shown below:

bigdata-4vs-businessvalue

By using those BigData 4Vs Paradigm, Organizations can get many befits by understanding “What, Who, When, Where, How” kind of questions:

  • What business decisions need to be made?
  • What insight can we derive from the information?
  • How accurate is that data in predicting business value?
  • Who could benefit from the information that we are capturing?
  • When do they need to know in order to make a more informed decision?
  • How to improve our business value?
  • How to improve our profits?
  • Where do we have more Profits?

BigData: Data Formats

In BigData 3V Paradigm, one V refers to Variety. It means generating or getting data in different formats.

In Data Era, We, Systems, Devices or Organizations are generating or getting the following types of Data Formats.

  • Structured Data
  • Structured Data means Data that is in the form of Rows and Columns. So it is very easy to store even in Relational Databases.

    In Simple words, Anything which possible to store in the form of Rows and Columns that is Structured Data.

    For Example:- Relational DBs Data(Online Subscription, Transactional Data etc).

    structured-data

  • Semi-Structured Data
  • Semi-Structured Data means Data that is formatted in some way. But it is not formatted in the form of Rows and Columns. It is possible to store in Relational Databases, but bit complex to manage and provide very less performance.

    For Example:-

    1. Log Files
    2. In Log Files, Columns are separated by using “Whitespace” charaters (Which are characters used to align things either horizontally or vertically. For instance, space or Tab space, next line etc).

      Observe the following JBoss Server log file:

      
      09:20:01,054 INFO  [org.jboss.modules] (main) JBoss Modules version 1.3
      09:20:01,652 INFO  [org.jboss.as.process.Host Controller.status] (main) JBAS012017: Starting process 'Host Controller'
      09:20:05,079 INFO  [org.jboss.as.process.Server: myserver.status] (ProcessController-threads - 10) JBAS012017: Starting process 'Server: myserver'
      17:01:58,833 INFO  [org.jboss.as.process] (Shutdown thread) JBAS012016: Shutting down process controller
      17:02:03,408 INFO  [org.jboss.as.process.Host Controller.status] (Shutdown thread) JBAS012018: Stopping process 'Host Controller'
      17:02:15,246 INFO  [org.jboss.as.process.Server: myserver.status] (ProcessController-threads - 9) JBAS012018: Stopping process 'Server: myserver'
      17:03:02,990 INFO  [org.jboss.as.process.Server:myserver.status] (reaper for Server: myserver) JBAS012010: Process 'Server: myserver' finished with an exit status of 0
      17:03:13,170 INFO  [org.jboss.as.process.Host Controller.status] (reaper for Host Controller) JBAS012010: Process 'Host Controller' finished with an exit status of 0
      17:03:13,195 INFO  [org.jboss.as.process] (Shutdown thread) JBAS012015: All processes finished; exiting
      

      If we observe above log file, first column (contains “timestamp”) is separated by some Whitespaces with 2nd column (Contains Logging level). It is semi-formatted, not fully formatted text.

    3. XML Documents
    4. Observe the following XML Document. It is also semi-formatted with XML start and end tags.

      xml-doc

  • Un-Structured Data
  • Un-Structured Data means Data that is not formatted in any way. It is not possible to store data in Relational Databases.

    For Example:- Audio files, Videos, Call Centre Executive Typed Text, Photos, Sensor Data,Web Data,Mobile Data,GPS Data,Social Media Data etc are Un-Structured Data.

    If we open any image file (for instance, jpeg file) in any text editor, we can see all binary data, which is not at all formatted any form.

Now-a-Days, People, Machines, Devices, Organizations and Internet are generating Multi-Structured Data that means combination of Structured Data, Semi-Structured Data and Un-Structured Data. It is not at all possible to store and manage this kind of Data using Traditional Old Technologies, Databases and Tools.

multi-structure-data

Here Big Data solutions solve this problem in efficient and cost-effective way.

BigData Advantages

If we use BigData solutions to store, manage, process and report our Data, we will get the following benefits:

  • Store Data of all types and sizes at low cost
  • Efficiently Store, Process and Manage our Data.
  • Provides Cost-effective way to mange our Data.
  • Provides Better Performance Solutions
  • Provides Highly Scalable Solutions
  • Produces Right Business Value
  • Increase Productivity
  • Increase Profits

BigData Solutions

The following is the list of Most Popular BigData Solutions available in the market.

  • Apache Hadoop BigData Solution
  • Amazon Web Services (AWS) BigData Solutions
  • Google Cloud BigData Solutions
  • Microsoft BigData Solutions
  • Cloud Era BigData Solutions
  • IBM BigData Solutions
  • Oracle BigData Solutions

BigData Use Cases

Most of the Organizations are using or moving to BigData. So it is not possible to list out all those BigData Organizations or Customers here.

We will provide only some popular Organizations who are using and benefiting from Big Data Solutions.

  • Facebook
  • Facebook is one of the popular Social Networking WebSite. World-wide, Around 1000 million users are using Facebook Application. It is collecting around 500TB (Tera Bytes) per Day from Users Subscription, User Likes, Posts, Relations Information, Audios, Videos, Pictures etc.

  • Google
  • Google is also using their BigData Cloud Platform to mange their applications data like Gmail, Google+, Google Search Engine, YouTube etc.

  • Adhar India
  • In India, UIDAI (Unique Identification Authority Of India) manages all Adhar Card information. It is also using BigData solutions to manage that huge amount of Data.

  • RedBus
  • RedBus is India’s largest online Bus Ticket and Hotel Booking organization. It is also using BigData Solutions to manage that huge amount of Data with very high traffic rate.

  • eBay and Amazon
  • Two World famous online shopping giants: eBay and Amazon are also using BigData solutions to mange their Customer Data, products information etc.

  • Airline Industry
  • A lot of Airlines (For Example:- British Airways, Singapore Airlines etc.) today are using BigData solutions to store and mange their aircraft and customers information.

  • Yahoo
  • Yahoo is also using their BigData Cloud Platform solutions to mange their applications data like Yahoo Mail, Yahoo Search Engine, Flickr etc.

  • Safari Books Online
  • Safari Books Online is an online subscription service for Individuals and Organizations to access their online Books, Tutorials, Videos.

  • New York Stock Exchange
  • The New York Stock Exchange is one the famous Stock Exchanges in the World. It generates about 5 TB (Tera Bytes) of data per day.

That’s it all about BigData Introduction. We will discuss some more BigData concepts and Hadoop Basics in my coming posts.

Please drop me a comment if you like my post or have any issues/suggestions.

Comments

  1. Shubhangi karbhari says:

    Thanks Ram,
    It really helpful…
    Awesome job you are doing

  2. Padam Jain says:

    Thanks a lot Rambabu, this is a really good article to start with for big data!! Really appreciate your good work.

  3. Dhivakar says:

    Hi Ram ,

    Thanks for good article that show good start for Bigdata.

  4. Victor Rojas says:

    Hola, hace un tiempo atrás lei papers sobre big data y el resumen que hiciste está muy bueno, gracias por el post

  5. Suleman ALowooja says:

    Hi Rambu,

    Complements of the season. I am pretty new to BigData and Hadoop, though I am currently studying it as a module for my Masters. I stumbled across your write up and I must say you really shed some light on the topic. This is quite interesting and your analysis brings fantastic points across.

    I shall be using some of your pointers in my write up.

    Much apreciated.

    Sol Alowooja

  6. Rajendra Babu says:

    Hi Rambabu
    We are waiting for the latest posts on BigData, When we can see the latest posts on BigData and also please provide the url for the latest posts if it’s already posted by you.

    Thanks
    Rajendra

  7. ila tewari says:

    Very clearly written, thanks a lot.

  8. Bhuwan says:

    Awesome sir very niceee 🙂

  9. shalini says:

    this big data information is really nice and informative and it is very useful thanks for sharing this exclusive information

  10. gixhub says:

    Nice article..thanks..also go the help https://goo.gl/YBLlqz

  11. Nnada says:

    How about Apache Storm (Real Time data processing) ?

    1. Rambabu says:

      Apache Storm and Apache Spark both are Real-time Data Processing and Data Streaming Components. They are part of Hadoop 2.x Echo-system. We will discuss these two components with suitable discussions and examples in my coming posts.

  12. Abhi says:

    This is great help for people like myself who was thinking about learning BigData. Great help thank you. Can I know more about your company and whether you provide real project based experience and knowledge as well besides theoretical knowledge?

    1. Rambabu says:

      Hi Abhi, Thanks. We are going to deliver complete End-to-End Step-by-Step Easy-to-learn Real-time Simple-to-Advanced examples Hadoop 2 tutorial. We will also provide how to setup/install Hadoop for all BigData Service provides. Finally we will also cover how to do BigData Hadoop solutions using Cloud (Google,AWS,Azure etc). Please go through my coming posts and post me your comments if you have queries.

  13. Amit says:

    Hello Ram,

    Thanks for posts on big data….

    Will you please advice on the steps how to start with installation, hardware requirements ?

    1. Rambabu says:

      Sure, We’re going to deliver Separate Setup/Installation for each and every BigData Hadoop Provider. First we are going to start developing programs using CloudEra. Please wait for few few more days.

  14. aswini says:

    Sir, can you tell about data formats or raw data formats for clarity on format..

    1. Rambabu says:

      Sure, will update that section with some simple examples to make it more clear for beginners.

  15. Sreenivasula Reddy B says:

    Nice Article. Big Data introduction explanation is very clear and step by step.
    I’m waiting for future articles on Big Data solutions. Please keep post for us.

  16. vinod says:

    Nice Article.. Good introduction about Big Data. Could you please elaborate the technology stack in BigData . Once again thanks for nice article.

  17. jeyaraj says:

    Hello Ram your article was so nice. It gives good and neat explanation about big data. I am doing my MSc Computer Science. I am new to big data. So how can i improve myself in BigData solutions. What are the prerequisites?

    1. Rambabu says:

      Thanks Raj.
      If you have time, go through my future posts. Otherwise, read one good book on BigData Hadoop.

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages