GCP Data Proc

+2 votes

How can I create a Google Cloud Storage bucket to use for my Google Cloud cluster and copy the Py Spark application to the bucket in my project. The Py spark app has been shared from a Cloud Storage bucket: GS://training/root.py. Please can you help solve this problem with what steps to take and if possible with pics?

Nov 26, 2019 by Deepthi
• 140 points
1,200 views
So if I have understood correctly, you need a cloud storage bucket that can be used by your google cloud cluster to copy the py spark application to the bucket?
Correct me if I am wrong.

Thank you!

1 answer to this question.

0 votes

Hey @Deepthi, you could do this:

Run the steps below to prepare to run the code in this tutorial.

  1. Set up your project. If necessary, set up a project with the Cloud Dataproc, Compute Engine, and Cloud Storage APIs enabled and the Cloud SDK installed on your local machine.

    • Select or create a GCP project.

    • Make sure that billing is enabled for your Google Cloud Platform project.

    • Enable the Cloud Dataproc, Compute Engine, and Cloud Storage APIs.

    • Install and initialize the Cloud SDK.

  2. Create a Cloud Storage bucket. You need a Cloud Storage to hold tutorial data. If you do not have one ready to use, create a new bucket in your project.

    1. In the GCP Console, go to the Cloud Storage Browser page.

    2. Click Create bucket.

    3. In the Create bucket dialog, specify the following attributes:

      • A unique bucket name.

      • A storage class.

      • A location where bucket data will be stored.

    4. Click Create.

  3. Set local environment variables. Set environment variables on your local machine. Set your GCP project-id and the name of the Cloud Storage bucket you will use. Also provide the name and zone of an existing or new Cloud Dataproc cluster. You can create a cluster to use in the next step.

    PROJECT=project-id
    BUCKET_NAME=bucket-name
    CLUSTER=cluster-name
    ZONE=cluster-region Example: "us-west1-a"

  4. Create a Cloud Dataproc cluster. Run the command, below, to create a single-node Cloud Dataproc cluster in the specified Compute Engine zone.

    gcloud dataproc clusters create $CLUSTER \
        --project=${PROJECT} \
        --zone=${ZONE} \
        --single-node
    
  5. Copy your pyspark application from that specific cloud storage bucket to your Cloud Storage bucket. 

    gsutil cp gs://training/root.py  gs://${BUCKET_NAME}

For more info refer to https://cloud.google.com/dataproc/docs/tutorials/gcs-connector-spark-tutorial 

answered Nov 26, 2019 by Karan
• 19,610 points

Related Questions

0 votes
1 answer

sort GCP billing data by metadata tags

Metadata identifies properties of the object as ...READ MORE

answered Mar 17, 2022 in GCP by Korak
• 5,820 points
619 views
0 votes
1 answer

GCP pricing API does not have data for compute instances like N2, M2, C1

You can check the list of available ...READ MORE

answered Apr 6, 2022 in GCP by Korak
• 5,820 points
877 views
0 votes
1 answer

What is the difference between boot disk and data disk in GCP (especially AI platform)

Boot disk is dedicated to the boot ...READ MORE

answered Apr 6, 2022 in GCP by Korak
• 5,820 points
3,839 views
0 votes
1 answer

GCP Metrics Explorer - "Metric Data" Historical?

Please update your question with the details ...READ MORE

answered Apr 6, 2022 in GCP by Korak
• 5,820 points
1,051 views
0 votes
1 answer

How can I perform data lineage in GCP?

There isn't a serverless data lineage service ...READ MORE

answered Nov 8, 2022 in GCP by Ashwini
• 5,430 points
1,075 views
0 votes
1 answer

HIVE DATA LOADING ERROR

Dear Raghu, Hope you are doing great. It is ...READ MORE

answered Dec 18, 2017 in Data Analytics by Sudhir
• 1,570 points
1,520 views
+1 vote
1 answer

R query and Data Science

Dear Deepika, Hope you are doing great. You can ...READ MORE

answered Dec 18, 2017 in Data Analytics by Sudhir
• 1,570 points
998 views
0 votes
1 answer

Big Data transformations with R

Dear Koushik, Hope you are doing great. You can ...READ MORE

answered Dec 18, 2017 in Data Analytics by Sudhir
• 1,570 points
1,167 views
0 votes
1 answer

What is ClickStream Data Analysis

On a Web site, clickstream analysis (also ...READ MORE

answered Mar 22, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
1,369 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP