Primary keys in apache Spark

Question

I have successfully established a JDBC connection with my spark and PostgreSQL. Am trying to insert some data into my database and I am using append mode but here I need to specify an id for each DataFrame.Row. Is there any other way to do it?

ravikiran · Answer 1 · Jul 11, 2019

from pyspark.sql.functions import monotonically_increasing_id
df.withColumn("id", monotonically_increasing_id()).show()

Verify the second argument of

df.withColumn is monotonically_increasing_id() not monotonically_increasing_id.

answered Jul 11, 2019 by ravikiran
• 4,620 points

Primary keys in apache Spark

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

How to read more than one files in Apache Spark?

What is the command to check the number of cores in Spark?

What is the Data format and database choices in Hadoop and Spark?

How can I calculate exact median with Apache Spark?

Primary keys in Apache Spark

How to create a list of RDDs(or RDD of RDDs, if possible) from a single JavaRDD<List<Integers>> in Java?

What do we exactly mean by “Hadoop” – the definition of Hadoop?

I installed Spark but while executing command, I am getting ‘hadoop’ command not found error?

Is it possible to run Apache Spark without Hadoop?

Is there a possibility to run Apache Spark without Hadoop?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES