Big Data and Hadoop

BTH Industrial Training Program

What is Big Data?

Big data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, technqiues and frameworks.

What Comes Under Big Data?

Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data. 

  • Black Box Data - It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.
  • Social Media Data  -  Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.
  • Stock Exchange Data - The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
  • Power Grid Data - The power grid data holds information consumed by a particular node with respect to a base station.
  • Transport Data - Transport data includes model, capacity, distance and availability of a vehicle.
  • Search Engine Data - Search engines retrieve lots of data from different databases.

What is Hadoop?

Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications in scalable clusters of computer servers. It's at the center of an ecosystem of big data technologies that are primarily used to support advanced analytics initiatives, including predictive analytics, data mining and machine learning. Hadoop systems can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing and analyzing data than relational databases and data warehouses provide.

Content :

  • Introduction to Big data
  • Introduction to Hadoop
  • Hive
  • Pig
  • Use Cases

Features of Hadoop and Big Data as following:

1. Data Processing
Data processing features involve the collection and organization of raw data to produce meaning. Data modeling takes complex data sets and displays them in a visual diagram or chart. This makes it digestible and easy to interpret for users trying to utilize that data to make decisions.


2. Predictive Applications
Identity management (or identity and access management) is the organizational process for controlling who has access to your data. Identity management functionality manages identifying data for everything that has access to a system including individual users, computer hardware and software applications.
Fraud analytics involve a variety of fraud detection functionalities. Too many businesses are reactive when it comes to fraudulent activities — they deal with the impact rather than proactively preventing it. Data analytics tools can play a role in fraud detection by offering repeatable tests that can run on your data at any time, ensuring you’ll know if anything is amiss. You also have wider coverage of your data as a whole rather than relying on spot checking at financial transactions. Analytics can be an early warning tool to quickly and efficiently identify potentially fraudulent activity before it has a chance to impact your business at large.


3. Analytics
Big Data analytics tools offer a variety of analytics packages and modules to give users options. RIsk analytics, for example, is the study of the uncertainty surrounding any given action. It can be used in combination with forecasting to minimize the negative impacts of future events. Risk analytics allow users to mitigate these risks by clearly defining and understanding their organization’s tolerance for and exposure to risk.

4. Reporting Features

Reporting functions keep users on top of their business. Real-time reporting gathers minute-by-minute data and relays it to you, typically in an intuitive dashboard format. This allows users to make snap decisions in heavily time-constrained situations and be both more prepared and more competitive in a society that moves at the speed of light.

5. Security Features
Keeping your system safe is crucial to a successful business. Big Data analytics tools should offer security features to ensure security and safety. One such feature is single sign-on. Also called SSO, it is an authentication service that assigns users a single set of login credentials to access multiple applications. It authenticates end user permissions and eliminates the need to login multiple times during the same session. It can also log and monitor user activities and accounts to keep track of who is doing what in the system. 


Join BTH

Our strategy is simple: to create a place where the best researchers and most promising students can achieve their full potential.

Apply Now or 9158211119