Monday, June 29, 2015

Hadoop Archives (har) - Creating and Reading HAR



A quick post that explains the following with samples
  • Create a HAR file
  • List the Contents of a HAR file
  • Read the contents of a file that is within a HAR


Listed below is the input  directory structure in HDFS I’ll be using to create a har

hadoop fs -ls /bejoyks/test/har/source_files/*
Found 2 items
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:25 /bejoyks/test/har/source_files/srcDir01/file1.tsv
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:25 /bejoyks/test/har/source_files/srcDir01/file2.tsv
Found 2 items
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:25 /bejoyks/test/har/source_files/srcDir02/file3.tsv
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:25 /bejoyks/test/har/source_files/srcDir02/file4.tsv


CLI Command to create a HAR

Syntax
hadoop archive -archiveName tsv <archiveName.har> -p <ParentDirHDFS> -r <ReplicationFactor> <childDir01> <childDir02> <DestinationDirectoryHDFS>

Command Used
hadoop archive -archiveName tsv_daily.har -p /bejoyks/test/har/source_files -r 3 srcDir01 srcDir02 /bejoyks/test/har/destination


LISTING DIRS and FILES in HAR
Syntax
hadoop fs –ls  har://<AbsolutePathOfHarFile>

Command Used and Output
Command 01 :
hadoop fs -ls har:///bejoyks/test/har/destination/tsv_daily.har
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2015-06-29 20:39 har:///bejoyks/test/har/destination/tsv_daily.har/srcDir01
drwxr-xr-x   - hadoop supergroup          0 2015-06-29 20:39 har:///bejoyks/test/har/destination/tsv_daily.har/srcDir02

Command 02 :
hadoop fs -ls har:///home/hadoop/work/bejoyks/test/har/destination/tsv_daily.har/srcDir01
Found 2 items
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:39 har:///bejoyks/test/har/destination/tsv_daily.har/srcDir01/file1.tsv
-rw-r--r--   3 hadoop supergroup         22 2015-06-29 20:39 har:///bejoyks/test/har/destination/tsv_daily.har/srcDir01/file2.tsv

READING a File within a HAR
hadoop fs -text har:///bejoyks/test/har/destination/tsv_daily.har/srcDir01/file2.tsv
file2    row1
file2    row2

** Common mistakes while reading a HAR file

Always use the URI while reading a HAR file
Since we are used lo listing the directories/files in HDFS without the URI , we might use the similar pattern here. But HAR files doen’t work well if it is not prefixed with URI . If listed without URI you’ll get the HAR metadata under the hood, something like below.

hadoop fs -ls /bejoyks/test/har/destination/tsv_daily.har
Found 3 items
-rw-r--r--   5 hadoop supergroup        277 2015-06-29 20:39 /bejoyks/test/har/destination/tsv_daily.har/_index
-rw-r--r--   5 hadoop supergroup         23 2015-06-29 20:39 /bejoyks/test/har/destination/tsv_daily.har/_masterindex
-rw-r--r--   3 hadoop supergroup         88 2015-06-29 20:39 /bejoyks/test/har/destination/tsv_daily.har/part-0

19 comments:

  1. The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training for Installing Haddop on their Own
    Thank you for sharing Such a good tutorials on Hadoop

    ReplyDelete
  2. May be it helps you to understand HDFS. so HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
    http://www.computaholics.in/2015/12/hdfs.html

    ReplyDelete
  3. Hi, your blog about hadoop is very usefull. I want to use this technology for my site www.urbanejodi.com

    ReplyDelete
  4. Really a good piece of knowledge on Big Data and Hadoop. Thanks for such a good post. I would like to recommend one more resource NPN Training which helps in getting more knowledge on Hadoop.
    The best part of NPN Training is they provide complete Hands-on classes.

    For More Details visit
    http://npntraining.com/courses/big-data-and-hadoop.php

    ReplyDelete
  5. Very nice post here thanks for it I always like and search such topics and everything connected to them.Excellent and very cool idea and the subject at the top of magnificence
    and I am happy to comment on this topic through which we address the idea of positive reaction.

    Hadoop Training in Chennai

    ReplyDelete
  6. Thanks for providing this informative information…..
    You may also refer-
    http://www.s4techno.com/blog/category/hadoop/

    ReplyDelete
  7. wow amazing post.The key points you mentioned here related to maintenance of car is really awesome.Checking all fluid levels,changing oil and of course the regular service of the car which is necessary to maintain our vehicle.Thank you for the information.

    home spa services in mumbai

    ReplyDelete
  8. Hello,
    Thank you for the Blog.Parana Impact help you reach the right target customers
    to advertise your products and services.
    Hadoop Users Email List

    ReplyDelete
  9. I want you to thank for your time of this wonderful read!!! I definately enjoy every little bit of it and I have you bookmarked to check out new stuff of your blog a must read blog!!!!
    IOT Training
    IOT Online Training

    ReplyDelete
  10. I am expecting more interesting topics from you. And this was nice content and definitely it will be useful for many people.
    Back to original

    ReplyDelete
  11. thank you for offering such unique content.we are very happy to recieve articles from you.please update latest content in hadoop.one of the recommanded blog for newbies and hadoop professionals with great intend

    Hadoop training
    Hadoop training in hyderabad
    Hadoop training in usa

    ReplyDelete
  12. such a nice blog very helpful content for hadoop learners.who are taking on

    hadoop training it will helps students and professional.ome of the recommanded blog

    ReplyDelete
  13. Thank you for sharing such a nice and interesting blog with us. i have seen that all will say the same thing repeatedly. But in your blog, I had a chance to get some useful and unique information. I would like to suggest your blog in my dude circle.
    Selenium Training in Chennai

    ReplyDelete
  14. I really appreciate for your efforts to make things easy to understand. I was really many students struggling to understand certain concepts but you made it clear and help me bring back my confidence.

    Hadoop online training
    Hadoop online training in hyderabad
    Hadoop online training in usa
    Hadoop training in hyderabad

    ReplyDelete
  15. Thanks for this blog. provided great information. All the details are explained clearly with the great explanation. Thanks for this wonderful blog. Step by step processes execution are given clearly.Know the details about different thing.

    Seo Company in India

    ReplyDelete
  16. Wow.. Thanks much for sharing.. My friend also recommended you so that i can have a helping hand to make my blog as effective as possible.
    Study in USA Consultants in Chennai | Overseas Education Consultants in Chennai | Australia education Consultants in Chennai

    ReplyDelete
  17. All the details are explained clearly with the great explanation. Thanks for this wonderful blog. Step by step processes execution are given clearly.Know the details about different thing.
    Selenium Training in Chennai

    ReplyDelete
  18. Finding the time and actual effort to create a superb article like this is great thing. I’ll learn many new stuff right here! Good luck for the next post buddy..
    SEO Company in Chennai

    ReplyDelete