Tuesday, May 22, 2012

How to recover deleted files from hdfs/ Enable trash in hdfs

If you enable thrash in hdfs, when an rmr is issued the file will be still available in trash for some period. There by you can recover accidentally deleted ones. To enable hdfs thrash
set fs.trash.interval > 1

 
This specifies the time interval a file deleted would be available in trash. There is a property (fs.trash.checkpoint.interval) that specifies the checkpoint interval NN checks the trash dir at every intervals and deletes all files older than specified fs.trash.interval . ie say you have your
fs.trash.interval as 60 mins and fs.trash.checkpoint.interval as 30 mins, then in every 30 mins a check is performed and deletes all files that are more than 60 mins old.

fs.trash.checkpoint.interval should be equal to or less than fs.trash.interval

The value of fs.trash.interval  is specified in minutes.

fs.trash.interval should be enabled in client node as well as Name Node. Name Node it should be present for check pointing purposes. Based the value in client node it is decided whether to remove a file completely from hdfs or thrash it on an rmr issued from client.

The trash dir by default is /user/X/.Trash

5 comments:

  1. * Will this configuration be avilable in hdfs-site.xml ?


    * I have executed a rmr command in hdfs wrongly and my DFS usage is gradually getting decreased because of the files that is being deleted. Is there any way to stop the execution of that command ?

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hadoop is really a good booming technology in coming days. And good path for the people who are looking for the Good Hikes and long career. We also provide Hadoop online training

    ReplyDelete
  4. Hadoop cluster lab setup on rest for learning purpose, if anyone interested, please look at
    http://www.s4techno.com/lab-setup/

    ReplyDelete
  5. Thankful for sharing this productive information to our vision.Informatica Training in Chennai

    ReplyDelete