Hadoop

Hadoop


What Is Apache Hadoop?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.


1.    Download and Install from apache website
       To start / stop hadoop use the following Commands /scripts on the hadoop installed directory:

a.  bin/start-dfs.sh

b.  bin/start-mapred.sh

c.  bin/stop-mapred.sh

d.  bin/stop-dfs.sh

2.       Check http://localhost:50030/jobtracker.jsp for Hadoop Services . 1 Node should be available for execution of MAPREDUCE jobs.
          Check http://localhost:50070/dfshealth.jsp for Namenode service.

3.       Start Putty login to hadoop server : haddop-user/hadoop

4.       Configuration file : hadoop-site.xml Set property there if required

The following settings are necessary to configure HDFS:

fs.default.name               protocol://servername:port hdfs://master.ora.org:8000

dfs.data.dir                         pathname /home/username/hdfs/data

dfs.name.dir                      pathname /home/username/hdfs/name

5.  One time format of HDFS is required for haddop to use the file system :

$ bin/hadoop namenode -format





What Is Map Reduce ?
The Mapreduce is a hadoop component which allows the programmer to split a large files into chunks and to group them together and store the output.
In SQL terminology Map is similler to Group by clause and Reduce can be similler to Union All clause.
A Mapreduce job splits the input data into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Lets take an example to understand the mapreduce task :
File 1 :
Hello World Bye World
File 2 :
Hello World  bye Data

the key value paise which is generated  < <word>, 1>.

< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
The second map emits:
< Hello, 1>
< World, 1>
< bye, 1>
< Data, 1>
  

The output of the first map:
< Bye, 1>
< Hello, 1>
< World, 2>
The output of the second map:
< Bye, 1>
< World, 1>
< Hello, 1>
<Data, 1 >
 The final output of the reducer job can be :
< Bye, 2>
< Data, 1>
< Hello, 3>
< World, 2>

24 comments:

  1. tutorial on Apache Hadoop is good.I am happy to found such helpful and fascinating post that is written in well manner. i actually enhanced my data when browse your post .thanks
    Hadoop Training in hyderabad

    ReplyDelete
  2. Thank you so much for sharing this great information. Today I stand as a successful hadoop certified professional. Thanks to Big Data Training Chennai

    ReplyDelete
  3. very nice !!! i have to learning a lot of information for this sites...Sharing for wonderful information.
    AWS Training in chennai | AWS Training chennai | AWS course in chennai

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.
    Java Training Institute Bangalore


    Best Java Training Institute Chennai



    ReplyDelete
  7. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.

    Best Hadoop Training in Chennai

    ReplyDelete
  8. You explained this topic so nicely with some awesome example. Great work, love it. Hadoop Pune

    ReplyDelete
  9. Advance Hadoop training in Delhi every day technology changes and we are learning more technical things to support ourself in the computative world,so our Aptron Solutions provides more knowledge with real time projects and experienced staff, they have 10 to 20 years experience in training and numerous students are placed in top and best corporate companies.We give both online training and classroom training with student flexibility.
    For More Info: Hadoop Course in Delhi

    ReplyDelete
  10. Thanks a lot for sharing such a good source with all, i appreciate your efforts taken for the same. I found this worth sharing and must share this with all.




    Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery


    ReplyDelete
  11. Great Article
    Cloud Computing Projects




    JavaScript Training in Chennai


    JavaScript Training in Chennai


    big data projects for students

    The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

    ReplyDelete
  12. There Are Many Complaints About XM REVIEW Broker In The Internet But You Should Read This Review Before Investing Your Money With Them. We Have Personally Tested XM Fx And Found It To Be A Scam, Avoid Them At All Costs!

    ReplyDelete
  13. Know the features, advantages and the major difference between Java and Python with the emphasized examples from the best software training institute in Chennai, Infycle Technologies. Dial +91-7504633633 or +91-7502633633 to know the best offers and get the free demo for the combo of Python + Java

    ReplyDelete
  14. This post is so interactive and informative.keep update more information...
    Salesforce Training in Tambaram
    Salesforce Training in Anna Nagar

    ReplyDelete

 Few of the areas I have worked in past are  :                               Big Data                                                       ...