Create a Managed Inference Job
Create a Managed Inference Job interactively with the CLI, then confirm it.
Create a Managed Inference Job from the CLI. You answer a few prompts, then CosmicAC deploys the model behind an OpenAI-compatible endpoint.
Prerequisites
You need the following before you start.
- A running CosmicAC deployment. See Installation.
- The CosmicAC CLI installed and configured. See Install the CLI.
Create the job
Start the interactive job setup.
cosmicac jobs createSelect Managed Inference as the job type, then set these fields. The Job configuration reference describes every field.
- Job name — the name for the job.
- Tags — one or more comma-separated tags.
- Location — the region where the job runs, for example
usorIN. - GPU type — the GPU to use.
- GPU count — the number of GPUs.
- GPU driver — an optional GPU driver.
- CPU cores per GPU — optional CPU cores per GPU.
- Memory GB per GPU — optional memory per GPU in GB.
- Model — the model to serve. Select a listed model, or enter a Hugging Face model ID to bring your own.
- Endpoint name — the name for the endpoint.
- Replica count — the number of replicas.
- Require auth header — whether callers must send an API key. See Create an API key.
You can serve any model that vLLM supports. Browse the Hugging Face model hub or the vLLM supported models list to find one.
CosmicAC creates the job and prints its ID.
Confirm the job
List your jobs to confirm the job was created.
cosmicac jobs listThe job appears in the table with its ID, name, tags, and status.