ylliX - Online Advertising Network
Vertex AI - Antrophic and Mistral models: Why does it require Imegen access?

How to Run a Workflow Multiple Times with Different Inputs (Using Apache Beam or Native Workflow Features)?


I’m working on a workflow using Google Cloud Workflow, and I want to run the same workflow multiple times with different input values. I’ve been researching this, and I found that Apache Beam can be used to launch multiple workflows from Python. However, I’m now curious if that is the best approach or if there is a way to directly run multiple iterations from within the workflow itself.

Here’s an excerpt of my workflow definition, which currently handles a single input:

main:
  params: [input]
  steps:
  - init:
      assign:
      - project_id: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
      - gcp_env: ${sys.get_env("GCP_ENV")}
      - job_name: ${sys.get_env("GCP_ENV") + "-shiXXX-XXXXX-XXX"}
      - job_location: asia-northeast1
  - parallel_translate:
        parallel:
          for:
            range: ${[0, input.count]}
            value: i
            steps:
                - runJob:
                    call: googleapis.run.v1.namespaces.jobs.run
                    args:
                      name: ${"namespaces/" + project_id + "/jobs/" + job_name}
                      location: ${job_location}
                      body:
                        overrides:
                          containerOverrides:
                            env:
                              - name: INPUT_FILE_PATH
                                value: '${input.inputPath}'
                    result: jobRunResponse

This works fine when I run it with one input, for example:

{"count": 1, "inputPath": "XXXXXXXXXX-XXXXXX"}

But now, I want to run the workflow with multiple inputs, like this:

[
  {"count": 1, "inputPath": "XXXXXX-XXX1"},
  {"count": 1, "inputPath": "XXXXXX-XXX2"},
  {"count": 1, "inputPath": "XXXXX2-XXX1"}
]

I’ve considered using Apache Beam to launch these workflows from Python, but I’m wondering if there is a simpler or more efficient way to handle this directly within the workflow itself.

Questions:

Is it possible to modify the workflow so it can iterate over multiple inputs directly?
Is using Apache Beam to trigger multiple workflows from Python a good approach for this, or are there other better methods for parallelizing or batching workflow executions in Google Cloud?
Are there best practices for handling such scenarios in Google Cloud Workflows that I should be aware of?

Any insights or suggestions would be greatly appreciated!



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *