Hi guys,
I am a beginner at pyspark and I'm working with a dataset where I'd like to find out the top three countries with the highest number of covid cases. I've googled enough to find out the solution for this and by theory, it should work but for some reason it doesn't. (Please see screenshot)
Note my dataframe has two columns: country & total_count (where total_count is the total number of confirmed covid cases)
Here's my code:
from pyspark.sql.functions import desc
Top_by_Country = df_covid_3.groupBy('Country').max().select(['total_count'])
Top_by_Country.orderBy(desc("total_count"))
Here's the error I get:
AnalysisException: cannot resolve '`total_count`' given input columns: [Country, max(total_count)]; 'Project ['total_count] +- Aggregate [Country#11261], [Country#11261, max(total_count#9842) AS max(total_count)#11923] +- Project [Country#11261, total_count#9842]
Any help would be appreciated
