how create distance vector in pyspark Euclidean distance

0 votes

Hi,  I want to implementation this image in pyspark. Please help me or tell me the code. Thanks

Oct 16, 2020 in Apache Spark by dani
• 160 points
4,674 views

1 answer to this question.

+1 vote

Hi@dani,

You can find the euclidean distance using the available PySpark module as shown below.

import math._
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.linalg.Vectors

//input two vectors of length n, but must be equal length
//output euclidean distance between the vectors
val euclideanDistance = udf { (v1: Vector, v2: Vector) =>
    sqrt(Vectors.sqdist(v1, v2))
}

But if you want to use your own module then I suggest you create a mathematical expression and it with python language.

Hope this helps!

Join PySpark training online today to know more about Pyspark.

Thanks.

answered Oct 16, 2020 by MD
• 95,460 points
Hi, What's difference between call a function from Python in Spark and execute it or converting that function to pyspark code and executing it? Will the run time be different?
for example we can call kmeans function of python in spark or using of pyspark kmeans library ? thanks

Hi@dani,

You can think in this way that PySpark will make the operation easier only for Spark. Whereas if you use the Kmeans function of python in spark, then you have to import more modules, more operations, etc.

Related Questions In Apache Spark

0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,770 points
125,593 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

answered Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,691 views
0 votes
1 answer

How to create RDD from parallelized collection in scala?

Hi, You can check this example in your ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,770 points
1,601 views
0 votes
1 answer

How to create RDD from existing RDD in scala?

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5))                           -  Creating ...READ MORE

answered Feb 29, 2020 in Apache Spark by anonymous
1,423 views
0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,770 points
1,763 views
0 votes
1 answer

How to create scala project in intellij?

You have to install Intellij with scala plugin. ...READ MORE

answered Jul 5, 2019 in Apache Spark by Jimmy
2,282 views
0 votes
1 answer

Spark: How can i create temp views in user defined database instead of default database?

You can try the below code: df.registerTempTable(“airports”) sqlContext.sql(" create ...READ MORE

answered Jul 14, 2019 in Apache Spark by Ishan
4,487 views
0 votes
1 answer

How to call the Debug Mode in PySpark?

As far as I understand your intentions ...READ MORE

answered Jul 26, 2019 in Apache Spark by ravikiran
• 4,620 points
6,168 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 88,751 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

answered Feb 6, 2020 in Apache Spark by MD
• 95,460 points
4,157 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP