Run Commands
Read the PPS series >

Spout PPS

Ingest streaming data using a spout pipeline.

April 4, 2024

Spec #

This is a top-level attribute of the pipeline spec.

{
  "pipeline": {...},
  "transform": {...},
  "spout": {
  \\ Optionally, you can combine a spout with a service:
  "service": {
    "internal_port": int,
    "external_port": int
    }
  },
  ...
}

Attributes #

AttributeDescription
serviceAn optional field that is used to specify how to expose the spout as a Kubernetes service.
internal_portUsed for the spout’s container.
external_portUsed for the Kubernetes service that exposes the spout.

Behavior #

  • Does not have a PFS input; instead, it consumes data from an outside source.
  • Can have a service added to it. See Service.
  • Its code runs continuously, waiting for new events.
  • The output repo, pfs/out is not directly accessible. To write into the output repo, you must to use the put file API call via any of the following:
    • pachctl put file
    • A HPE ML Data Management SDK (for golang or Python )
    • Your own API client.
  • HPE ML Data Management CLI (PachCTL) is packaged in the base image of your spout as well as your authentication information. As a result, the authentication is seamless when using PachCTL.

Diagram #

spout-tldr

When to Use #

You should use the spout field in a HPE ML Data Management Pipeline Spec when you want to read data from an external source that is not stored in a HPE ML Data Management repository. This can be useful in situations where you need to read data from a service that is not integrated with HPE ML Data Management, such as an external API or a message queue.

Example scenarios:

  • Data ingestion: If you have an external data source, such as a web service, that you want to read data from and process with HPE ML Data Management, you can use the spout field to read the data into HPE ML Data Management.

  • Real-time data processing: If you need to process data in real-time and want to continuously read data from an external source, you can use the spout field to read the data into HPE ML Data Management and process it as it arrives.

  • Data integration: If you have data stored in an external system, such as a message queue or a streaming service, and you want to integrate it with data stored in HPE ML Data Management, you can use the spout field to read the data from the external system and process it in HPE ML Data Management.

Example #

{
  "pipeline": {
    "name": "my-spout"
  },
    "spout": {
  },
  "transform": {
    "cmd": [ "go", "run", "./main.go" ],
    "image": "myaccount/myimage:0.1",
    "env": {
        "HOST": "kafkahost",
        "TOPIC": "mytopic",
        "PORT": "9092"
    }
  }
}
💡

For a first overview of how spouts work, see our spout101 example.