Friday 13 October 2017

What Comes Under Big Data?

Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data.
  • Black Box Data : It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.
  • Social Media Data : Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.
  • Stock Exchange Data : The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
  • Power Grid Data : The power grid data holds information consumed by a particular node with respect to a base station.
  • Transport Data : Transport data includes model, capacity, distance and availability of a vehicle.
  • Search Engine Data : Search engines retrieve lots of data from different databases.

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types.
  • Structured data : Relational data.
  • Semi Structured data : XML data.
  • Unstructured data : Word, PDF, Text, Media Logs.

Benefits of Big Data

Big data is really critical to our life and its emerging as one of the most important technologies in modern world. Follow are just few benefits which are very much known to all of us:
  • Using the information kept in the social network like Facebook, the marketing agencies are learning about the response for their campaigns, promotions, and other advertising mediums.
  • Using the information in the social media like preferences and product perception of their consumers, product companies and retail organizations are planning their production.
  • Using the data regarding the previous medical history of patients, hospitals are providing better and quick service.

Big Data Technologies

Big data technologies are important in providing more accurate analysis, which may lead to more concrete decision-making resulting in greater operational efficiencies, cost reductions, and reduced risks for the business.
To harness the power of big data, you would require an infrastructure that can manage and process huge volumes of structured and unstructured data in realtime and can protect data privacy and security.
There are various technologies in the market from different vendors including Amazon, IBM, Microsoft, etc., to handle big data. While looking into the technologies that handle big data, we examine the following two classes of technology:

Operational Big Data

This include systems like MongoDB that provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored.
NoSQL Big Data systems are designed to take advantage of new cloud computing architectures that have emerged over the past decade to allow massive computations to be run inexpensively and efficiently. This makes operational big data workloads much easier to manage, cheaper, and faster to implement.
Some NoSQL systems can provide insights into patterns and trends based on real-time data with minimal coding and without the need for data scientists and additional infrastructure.

Analytical Big Data

This includes systems like Massively Parallel Processing (MPP) database systems and MapReduce that provide analytical capabilities for retrospective and complex analysis that may touch most or all of the data.
MapReduce provides a new method of analyzing data that is complementary to the capabilities provided by SQL, and a system based on MapReduce that can be scaled up from single servers to thousands of high and low end machines.
These two classes of technology are complementary and frequently deployed together.

Operational vs. Analytical Systems

OperationalAnalytical
Latency1 ms - 100 ms1 min - 100 min
Concurrency1000 - 100,0001 - 10
Access PatternWrites and ReadsReads
QueriesSelectiveUnselective
Data ScopeOperationalRetrospective
End UserCustomerData Scientist
TechnologyNoSQLMapReduce, MPP Database

Big Data Challenges

The major challenges associated with big data are as follows:
  • Capturing data
  • Curation
  • Storage
  • Searching
  • Sharing
  • Transfer
  • Analysis
  • Presentation
To fulfill the above challenges, organizations normally take the help of enterprise servers.

Thursday 12 October 2017

HADOOP BIG DATA SIMPLIFIED





Today we live in the age of Big data, where data volumes have outgrown the storage & processing capabilities of a single machine, and the different types of data formats required to be analyzed has increased tremendously. 

This brings 2 fundamental challenges:

How to store and work with huge volumes & variety of data
How to analyze these vast data points & use it for competitive advantage.
Hadoop fills this gap by overcoming both challenges. Hadoop is based on research papers from Google & it was created by Doug Cutting, who named the framework after his son’s yellow stuffed toy elephant.

So What is Hadoop? It is a framework made up of:

HDFS – Hadoop distributed file system
Distributed computation tier using programming of MapReduce
Sits on the low-cost commodity servers connected together called Cluster
Consists of a Master Node or NameNode to control the processing
Data Nodes to store & process the data
JobTracker & TaskTracker to manage & monitor the jobs
Let us see why Hadoop has become so popular today.

Over last decade all the data computations were done by increasing the computing power of single machine by increasing the number of processors & increasing the RAM, but they had physical limitations.
As the data started growing beyond these capabilities, an alternative was required to handle the storage requirements of organizations like eBay (10 PB), Facebook (30 PB), Yahoo (170 PB), JPMC (150 PB)
With typical 75 MB/Sec disk data transfer rate it was impossible to process such huge data sets
Scalability was limited by physical size & no or limited fault tolerance
Additionally, various formats of data are being added to the organizations for analysis which is not possible with traditional databases
How Hadoop addresses these challenges:

Data is split into small blocks of 64 or 128MB and stored onto a minimum of 3 machines at a time to ensure data availability & reliability
Many machines are connected in a cluster work in parallel for faster crunching of data
If any one machine fails, the work is assigned to another automatically
MapReduce breaks complex tasks into smaller chunks to be executed in parallel
Benefits of using Hadoop as a Big data platform are:

Cheap storage – commodity servers to decrease the cost per terabyte
Virtually unlimited scalability – new nodes can be added without any changes to existing data providing the ability to process any amount of data with no archiving necessary
Speed of processing – tremendous parallel processing to reduce processing time
Flexibility – schema less, can store any data format – structured & unstructured (audio, video, texts, csv, pdf, images, logs, clickstream data, social media)
Fault tolerant – any node failure is covered by another node automatically
Later multiple products & components are added to Hadoop so it is now called an eco-system, such as:
Hive – SQL like interface
Pig – data management language, like commercial tools AbInitio, Informatica,
HBase – column oriented database on top of HDFS
Flume – real time data streaming such as credit card transaction, videos
Sqoop – SQL interface to RDBMS and HDFS
Zookeeper – a DBA management for Hadoop
And several such products are getting added all the time from various companies like Cloudera, Hortonworks, Yahoo etc.

How some of the world leaders are using Hadoop:
Chevron collects large amounts of seismic data to find where they can get more resources
JPMC uses it for storing more than 150 PB of data, over 3.5 Billion user log-ins for Fraud detection
eBay using it for real time analysis and search of 9 PB data with 97 million active buyers, over 200 million items on sale
Nokia uses it store data from phone service logs to analyze how people interact with apps and usage patterns
Walmart uses it to analyze customer behaviour of over 200 million customer visits in a week
UC Irvine Health hospitals are storing 9 million patients records over 22 years to build patients surveillance algorithms
Hadoop may not replace the existing data warehouses, but it is becoming the number 1 choice for Big data platforms with a strong price/performance ratio. Best Computer Training Institute

Tuesday 10 October 2017

The Right Skills for the Job?

                          
Best Computer Training Institute Creating jobs and increasing productivity are at the top of agenda for policymakers across the world. Knowledge accumulation and skills are recognized as central in this process. More-educated workers not only have better employment opportunities, earn more, and have more stable and rewarding jobs, but also they are more adaptable and mobile. Workers who acquire more skills also make other workers and capital more productive and, within the firm, they facilitate the adaptation, adoption, and ultimately invention of new technologies. This is crucial to enable economic diversification, productivity growth, and ultimately raise the standards of living of the population. :)

New ideas on how to build and upgrade job relevant skills, focusing on three types of training programs relevant for individuals who are leaving the formal general schooling system or are already in the labor market: 

1. pre-employment technical and vocational education and training (TVET); 
2. on-the-job training (OJT); and 
3. training-related active labor market programs (ALMPs). 

Several previous studies have discussed some of the flaws in current systems and outlined options for reform. As a consequence, there has been a shift away from the investment in pre-vocational training courses to programs to improve access to and the quality of general secondary education. There have also been calls to encourage a stronger involvement of the private sector in the provision of training, together with increased emphasis in the quality and relevance of the content. One result has been a push to rethink the governance and financing arrangements of training institutions. But overall policies at these three levels of the training systems remain disconnected and there has not been an integrated framework linking them to the market and government failures that need to be addressed. This book makes two important contributions. First, it takes an in-depth look at the types of market and government failures that can result in underinvestment in training or the supply of skills that are not immediately relevant to the labor market. Second, building on the analysis of the limitations of both markets and governments and the results of case studies and recent impact evaluations, the report develops new ideas to improve the design and performance of current training systems. 

Python Training in Vizag

Start learning Python today. Find the best Python programming course for your level and needs, from Python for web development to...