Spout PPS

Spec

This is a top-level attribute of the pipeline spec.

{
  "pipeline": {...},
  "transform": {...},
  "spout": {
  \\ Optionally, you can combine a spout with a service:
  "service": {
    "internalPort": int,
    "externalPort": int,
    "ip": string,
    "type": string
    }
  },
  ...
}

Attributes

Attribute Description
service An optional field that is used to specify how to expose the spout as a Kubernetes service.
internalPort Used for the spout’s container.
externalPort Used for the Kubernetes service that exposes the spout.

Behavior

  • Does not have a PFS input; instead, it consumes data from an outside source.
  • Can have a service added to it. See Service.
  • Its code runs continuously, waiting for new events.
  • The output repo, pfs/out is not directly accessible. To write into the output repo, you must to use the put file API call via any of the following:
    • pachctl put file
    • A HPE Machine Learning Data Management SDK (for golang or Python )
    • Your own API client.
  • HPE Machine Learning Data Management CLI (PachCTL) is packaged in the base image of your spout as well as your authentication information. As a result, the authentication is seamless when using PachCTL.

Diagram

spout-tldr

When to Use

You should use the spout field in a HPE Machine Learning Data Management Pipeline Spec when you want to read data from an external source that is not stored in a HPE Machine Learning Data Management repository. This can be useful in situations where you need to read data from a service that is not integrated with HPE Machine Learning Data Management, such as an external API or a message queue.

Example scenarios:

  • Data ingestion: If you have an external data source, such as a web service, that you want to read data from and process with HPE Machine Learning Data Management, you can use the spout field to read the data into HPE Machine Learning Data Management.

  • Real-time data processing: If you need to process data in real-time and want to continuously read data from an external source, you can use the spout field to read the data into HPE Machine Learning Data Management and process it as it arrives.

  • Data integration: If you have data stored in an external system, such as a message queue or a streaming service, and you want to integrate it with data stored in HPE Machine Learning Data Management, you can use the spout field to read the data from the external system and process it in HPE Machine Learning Data Management.

Example

{
  "pipeline": {
    "name": "my-spout"
  },
    "spout": {
  },
  "transform": {
    "cmd": [ "go", "run", "./main.go" ],
    "image": "myaccount/myimage:0.1",
    "env": {
        "HOST": "kafkahost",
        "TOPIC": "mytopic",
        "PORT": "9092"
    }
  }
}
Tip