Thursday 12 October 2017

HADOOP BIG DATA SIMPLIFIED





Today we live in the age of Big data, where data volumes have outgrown the storage & processing capabilities of a single machine, and the different types of data formats required to be analyzed has increased tremendously. 

This brings 2 fundamental challenges:

How to store and work with huge volumes & variety of data
How to analyze these vast data points & use it for competitive advantage.
Hadoop fills this gap by overcoming both challenges. Hadoop is based on research papers from Google & it was created by Doug Cutting, who named the framework after his son’s yellow stuffed toy elephant.

So What is Hadoop? It is a framework made up of:

HDFS – Hadoop distributed file system
Distributed computation tier using programming of MapReduce
Sits on the low-cost commodity servers connected together called Cluster
Consists of a Master Node or NameNode to control the processing
Data Nodes to store & process the data
JobTracker & TaskTracker to manage & monitor the jobs
Let us see why Hadoop has become so popular today.

Over last decade all the data computations were done by increasing the computing power of single machine by increasing the number of processors & increasing the RAM, but they had physical limitations.
As the data started growing beyond these capabilities, an alternative was required to handle the storage requirements of organizations like eBay (10 PB), Facebook (30 PB), Yahoo (170 PB), JPMC (150 PB)
With typical 75 MB/Sec disk data transfer rate it was impossible to process such huge data sets
Scalability was limited by physical size & no or limited fault tolerance
Additionally, various formats of data are being added to the organizations for analysis which is not possible with traditional databases
How Hadoop addresses these challenges:

Data is split into small blocks of 64 or 128MB and stored onto a minimum of 3 machines at a time to ensure data availability & reliability
Many machines are connected in a cluster work in parallel for faster crunching of data
If any one machine fails, the work is assigned to another automatically
MapReduce breaks complex tasks into smaller chunks to be executed in parallel
Benefits of using Hadoop as a Big data platform are:

Cheap storage – commodity servers to decrease the cost per terabyte
Virtually unlimited scalability – new nodes can be added without any changes to existing data providing the ability to process any amount of data with no archiving necessary
Speed of processing – tremendous parallel processing to reduce processing time
Flexibility – schema less, can store any data format – structured & unstructured (audio, video, texts, csv, pdf, images, logs, clickstream data, social media)
Fault tolerant – any node failure is covered by another node automatically
Later multiple products & components are added to Hadoop so it is now called an eco-system, such as:
Hive – SQL like interface
Pig – data management language, like commercial tools AbInitio, Informatica,
HBase – column oriented database on top of HDFS
Flume – real time data streaming such as credit card transaction, videos
Sqoop – SQL interface to RDBMS and HDFS
Zookeeper – a DBA management for Hadoop
And several such products are getting added all the time from various companies like Cloudera, Hortonworks, Yahoo etc.

How some of the world leaders are using Hadoop:
Chevron collects large amounts of seismic data to find where they can get more resources
JPMC uses it for storing more than 150 PB of data, over 3.5 Billion user log-ins for Fraud detection
eBay using it for real time analysis and search of 9 PB data with 97 million active buyers, over 200 million items on sale
Nokia uses it store data from phone service logs to analyze how people interact with apps and usage patterns
Walmart uses it to analyze customer behaviour of over 200 million customer visits in a week
UC Irvine Health hospitals are storing 9 million patients records over 22 years to build patients surveillance algorithms
Hadoop may not replace the existing data warehouses, but it is becoming the number 1 choice for Big data platforms with a strong price/performance ratio. Best Computer Training Institute

12 comments:

  1. It was really a nice post and i was really impressed by reading this Big Data Hadoop Online Course Bangalore

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. This is very nice post i like it and i appreciate you for good work keep it up it is very useful for me.

    Data Science training in marathahalli
    Spark Training in Marathahalli

    ReplyDelete
  4. I think this is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article. Tableau Data Blending

    ReplyDelete
  5. Thanks for sharing this information. I really like your blog post very much. You have really shared a informative and interesting blog post .
    360digiTMG AI online course


    ReplyDelete
  6. your blog' s design is simple and clean and i like it. Your blog posts about Online writing Help are superb. Please keep them coming. Greets!

    big data certification course in Bangalore
    Hadoop Training in bangalore

    ReplyDelete
  7. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area. data science training in Hyderabad

    ReplyDelete
  8. Nice blog and absolutely outstanding. You can do something much better but I still say this perfect.Keep trying for the best.
    data analytics training in hyderabad

    ReplyDelete
  9. Pleasant data, important and incredible structure, as offer great stuff with smart thoughts and ideas, loads of extraordinary data and motivation, the two of which I need, because of offer such an accommodating data here.
    business analytics training in hyderabad

    ReplyDelete

Python Training in Vizag

Start learning Python today. Find the best Python programming course for your level and needs, from Python for web development to...