In Amazon Data Pipeline how to make sure only once instance of a pipeline is running at any time

0 votes

I have a pipeline with two tasks. Task 2 depends on Task 1 and maxActiveInstances is set to 1 for both tasks. Despite this dependency, under certain circumstances, Task 2 runs at the same time as Task 1. For example, if Task 2 takes too long and the scheduled start time of the pipeline's next execution is reached, Task 1 starts running at the same time. Same thing happens in case of backfilling.

Since these two tasks interfere with each other, I don't want them to run at the same time under any circumstances. Ideally, I'd want only want instance of the pipeline (not individual tasks) to run at a time. But I can't figure out how to do that.

Here's what the pipeline looks like with uninteresting parts replaced with ...:

{
  "objects": [
    {
      "period": "15 Minutes",
      "name": "Every 15 minutes",
      "id": "DefaultSchedule",
      "type": "Schedule",
      "startAt": "FIRST_ACTIVATION_DATE_TIME"
    },
    {
      "failureAndRerunMode": "CASCADE",
      "resourceRole": "...",
      "role": "...",
      "pipelineLogUri": "...",
      "scheduleType": "cron",
      "schedule": {
        "ref": "DefaultSchedule"
      },
      "maxActiveInstances": "1",
      "name": "Default",
      "id": "Default"
    },
    {
      "name": "CopyTablesActivity",
      "id": "CopyTablesActivity",
      "workerGroup": "dp01",
      "type": "ShellCommandActivity",
      "command": "..."
    },
    {
      "name": "CreateReportsActivity",
      "id": "CreateReportsActivity",
      "workerGroup": "dp01",
      "type": "ShellCommandActivity",
      "command": "...",
      "dependsOn": {
        "ref": "CopyTablesActivity"
      }
    }
  ],
  "parameters": [...]
}

Sep 19, 2018 in AWS by bug_seeker
• 15,510 points
2,234 views

1 answer to this question.

0 votes

On the CopyTablesActivity, you could set a lateAfterTimeout attribute to be 5 minutes or so and then add an attribute called onLateAction, and set that to terminate. The idea is if the CopyTablesActivity doesn't finish after 5 minutes, terminate the pipeline. As an example, the CopyTablesActivity object could look like so:

{ "name": "CopyTablesActivity", "id": "CopyTablesActivity", "workerGroup": "dp01", "lateAfterTimeout" : "5 minutes", "type": "ShellCommandActivity", "onLateAction" : { "ref" : "DefaultAction1" } "command": "..." } 

And then, you could define DefaultAction1 as such:

{ "name" : "TerminateTasks", "id" : "DefaultAction1", "type" : "Terminate" } 

See this link for more information: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-terminate.html

answered Sep 19, 2018 by Priyaj
• 58,020 points

Related Questions In AWS

0 votes
0 answers
0 votes
1 answer

how to get aws account number /id based on EC2 instance which is hosted in amazon

You can obtain account number from an ...READ MORE

answered Feb 21, 2022 in AWS by Korak
• 5,820 points
2,310 views
0 votes
1 answer
+2 votes
1 answer

How to Use Amazon AWS Credentials to Access/Query Tables in AWS RDS Instance through Mobile App

You should make your backend functions to ...READ MORE

answered Aug 28, 2018 in AWS by Priyaj
• 58,020 points
1,410 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP