Apache Spark Questions | Edureka Community

0 votes

1 answer

Internal work of Spark

Spark revolves around the concept of a ...READ MORE

Oct 11, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,032 views

0 votes

1 answer

Persistence Levels in Spark

Spark has various persistence levels to store ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 6,278 views

0 votes

1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 8,160 views

0 votes

1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 7,497 views

0 votes

1 answer

In what kind of use cases has Spark outperformed Hadoop in processing?

I can list some but there can ...READ MORE

Sep 19, 2018 in Apache Spark by zombie
• 3,790 points • 1,300 views

0 votes

1 answer

What happens to RDD when one of the nodes goes down?

Whenever a node goes down, Spark knows ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,939 views

0 votes

1 answer

How to stop INFO messages displaying on Spark console?

Just do the following: Edit your conf/log4j.properties file ...READ MORE

Aug 21, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 2,483 views

0 votes

2 answers

Which cluster type should I choose for Spark?

Spark is agnostic to the underlying cluster ...READ MORE

Aug 21, 2018 in Apache Spark by zombie
• 3,790 points • 2,180 views

0 votes

1 answer

Functions of Spark SQL?

Spark SQL is capable of: Loading data from ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,656 views

0 votes

1 answer

Does Spark provide the storage layer too?

No, it doesn’t provide storage layer but ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,612 views

0 votes

1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

Jun 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 4,211 views

0 votes

1 answer

What do we mean by an RDD in Spark?

The full form of RDD is a ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 4,164 views

+1 vote

1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 2,458 views

0 votes

1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 4,765 views

+1 vote

3 answers

Which cluster type should I choose for Spark?

According to me, start with a standalone ...READ MORE

Jun 27, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,811 views

0 votes

1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 5,769 views

0 votes

1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

Jul 26, 2018 in Apache Spark by zombie
• 3,790 points • 1,744 views

0 votes

1 answer

What makes Spark faster than MapReduce?

Let's first look at mapper side differences Map ...READ MORE

Jul 27, 2018 in Apache Spark by Neha
• 6,300 points • 1,700 views

0 votes

1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 4,110 views

0 votes

1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

Jul 5, 2018 in Apache Spark by Shubham
• 13,490 points • 2,454 views

0 votes

1 answer

How to get Spark dataset metadata?

There are a bunch of functions that ...READ MORE

Apr 26, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 5,089 views

0 votes

2 answers

Parquet Files Advantages

Parquet is a columnar format supported by ...READ MORE

Jul 4, 2018 in Apache Spark by zombie
• 3,790 points • 2,344 views

0 votes

1 answer

PySpark Config ?

Mainly, we use SparkConf because we need ...READ MORE

Jul 26, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 983 views

0 votes

1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points • 3,589 views

0 votes

1 answer

Spark streaming with Kafka dependency error

Your error is with the version of ...READ MORE

Jul 5, 2018 in Apache Spark by Shubham
• 13,490 points • 1,468 views

0 votes

1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points • 2,937 views

+1 vote

1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points • 2,872 views

0 votes

1 answer

What is Sliding Window?

Sliding Window controls transmission of data packets ...READ MORE

May 28, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 2,658 views

0 votes

2 answers

map() and flatmap()

map(): Return a new distributed dataset formed by ...READ MORE

Jul 4, 2018 in Apache Spark by zombie
• 3,790 points • 1,248 views

+1 vote

1 answer

Kafka Feature

Here are some of the important features of ...READ MORE

Jun 7, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 2,018 views

0 votes

1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

May 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 3,514 views

0 votes

1 answer

Minimizing Data Transfers in Spark

Minimizing data transfers and avoiding shuffling helps ...READ MORE

Jun 19, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 1,517 views

0 votes

1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,521 views

0 votes

1 answer

What is Spark Piping?

Spark provides a pipe() method on RDDs. ...READ MORE

May 31, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 2,286 views

0 votes

1 answer

Akka in Spark

Spark uses Akka basically for scheduling. All ...READ MORE

May 31, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 2,229 views

0 votes

1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

Jun 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,030 views

0 votes

1 answer

How to import the dependencies of Spark MLlib into eclipse project?

I would recommend you create & build ...READ MORE

May 31, 2018 in Apache Spark by Shubham
• 13,490 points • 2,116 views

0 votes

1 answer

Spark standalone client mode

spark-submit \ class org.apache.spark.examples.SparkPi \ deploy-mode client \ master spark//$SPARK_MASTER_IP:$SPARK_MASTER_PORT ...READ MORE

Jun 20, 2018 in Apache Spark by Ashish
• 2,650 points • 1,228 views

0 votes

1 answer

Spark Driver roles

A Spark driver (aka an application’s driver ...READ MORE

Jun 21, 2018 in Apache Spark by Ashish
• 2,650 points • 1,182 views

0 votes

1 answer

Why is collect in SparkR slow?

It's not the collect() that is slow. ...READ MORE

May 3, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 3,148 views

0 votes

1 answer

Is there any way to uncache RDD?

RDD can be uncached using unpersist() So. use ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,824 views

0 votes

1 answer

How to set keys & access tokens for Twitter Spark streaming?

Either you have to create a Twitter4j.properties ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points • 1,899 views

0 votes

1 answer

Is it mandatory to start Hadoop to run spark application?

No, it is not mandatory, but there ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 946 views

0 votes

1 answer

Convert the given Spar rdd object to Spark DataFrame.

You can create a DataFrame from the ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points • 1,282 views

0 votes

1 answer

Can I read a CSV represented as a string into Apache Spark?

You can use the following command. This ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 2,706 views

0 votes

1 answer

What is Shark?

Shark is a tool, developed for people ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 1,118 views

0 votes

1 answer

start-master and start-all?

sbin/start-master.sh : Starts a master instance on ...READ MORE

May 7, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 2,433 views

0 votes

1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

May 8, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 2,356 views

0 votes

1 answer

How does partitioning work in Spark?

By default a partition is created for ...READ MORE

May 31, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 1,347 views

0 votes

1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 1,159 views

Trending questions in Apache Spark

Most popular tags

Subscribe to our Newsletter, and get personalized recommendations.

CATEGORIES

TRENDING BLOG ARTICLES