49557/spark-foldbykey-doubt
val a= spark.sparkContext.parallelize(Array(("a",1),("a",2),("b",2))) val b =a.foldByKey(1)(_+_) scala> b.collect res2: Array[(String, Int)] = Array((b,3), (a,5))
Can someone tell me why a value is 5 not 4?
Please have a look below for your reference.
(a,1) (a,2) => foldByKey(1)(_+_) => (a,1+1)+(a,2+1) => 2+3 = 5 (b,2) => foldByKey(1)(_+_) => (b,2+1) = 3
According to that logic, the value is 5.
val a= spark.sparkContext.parallelize(Array(("a",1),("a",2),("b",2),("a",2)))
a: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:23
scala> val b =a.foldByKey(1)(_+_).collect
b: Array[(String, Int)] = Array((b,3), (a,7))
Hey, @Sitaram,
According to calculations, the result will be Array((b,6),(a,10), if you follow the above answer.
println("Slayer") is an anonymous block and gets ...READ MORE
Yes, you can reorder the dataframe elements. You need ...READ MORE
There are 2 ways to check the ...READ MORE
Hadoop 3 is not widely used in ...READ MORE
Instead of spliting on '\n'. You should ...READ MORE
Firstly you need to understand the concept ...READ MORE
org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE
Hi, You can create one directory in HDFS ...READ MORE
its late but this how you can ...READ MORE
You can use the function expr val data ...READ MORE
OR
At least 1 upper-case and 1 lower-case letter
Minimum 8 characters and Maximum 50 characters
Already have an account? Sign in.