How to work with distributed cache in Hadoop

0 votes

I am trying to implement distributed cache in my MapReduce program. In the main method I am adding the cache files.

Configuration conf = new Configuration();

Job job = new Job(conf, "example");

DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);

/user/vinay/card.txt file exists in my hdfs.

I am referring to this file in the setup method:

public void setup(Context context) throws IOException, InterruptedException{

    Configuration conf = context.getConfiguration();

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

}

 The cacheFiles array is always getting a null value. First, I tried running it on single node Hadoop cluster, but then I read somewhere that it prevents distributed cache working. Then I tried executing this code in pseudo-distributed mode, but then also it is not working.

Apr 20, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,357 views

1 answer to this question.

0 votes

The problem with your code is that you are first creating conf object and then you are creating the job and passing the conf as parameter. So, afterwards when you are loading the file in the distributed cache. It is not reflected in the job.

Instead, first try creating conf object, then add the distributed cache and at last cerate the job.

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);
Job job = new Job(conf, "example");
answered Apr 20, 2018 by kurt_cobain
• 9,350 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Hadoop: How to get the column name along with the output in Hive?

You can get the column names by ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,220 points
4,879 views
0 votes
1 answer

How to run Nutch in Hadoop installed in pseudo-distributed mode

Make sure you have built Nutch from ...READ MORE

answered Jan 24, 2019 in Big Data Hadoop by Frankie
• 9,830 points
950 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,090 points
1,167 views
0 votes
1 answer

How to run Hadoop in Docker containers?

Hi, You can run Hadoop in Docker container. Follow ...READ MORE

answered Jan 24, 2020 in Big Data Hadoop by MD
• 95,460 points
2,189 views
0 votes
1 answer

What is the function of getLocalCacheArchives method?

We use distributed cache to share those ...READ MORE

answered Apr 29, 2018 in Big Data Hadoop by Shubham
• 13,490 points
870 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,015 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,740 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,605 views
0 votes
1 answer

How to practice programming with Hadoop?

Well there are multiple ways to solve ...READ MORE

answered Mar 30, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
2,921 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP