Create a Managed Inference Job

Create a Managed Inference Job from the CLI. You answer a few prompts, then CosmicAC deploys the model behind an OpenAI-compatible endpoint.

You need the following before you start.

Start the interactive job setup.

cosmicac jobs create

Select Managed Inference as the job type, then set these fields. The Job configuration reference describes every field.

Job name — the name for the job.
Tags — one or more comma-separated tags.
Location — the region where the job runs, for example us or IN.
GPU type — the GPU to use.
GPU count — the number of GPUs.
GPU driver — an optional GPU driver.
CPU cores per GPU — optional CPU cores per GPU.
Memory GB per GPU — optional memory per GPU in GB.
Model — the model to serve. Select a listed model, or enter a Hugging Face model ID to bring your own.
Endpoint name — the name for the endpoint.
Replica count — the number of replicas.
Require auth header — whether callers must send an API key. See Create an API key.

You can serve any model that vLLM supports. Browse the Hugging Face model hub or the vLLM supported models list to find one.

CosmicAC creates the job and prints its ID.

List your jobs to confirm the job was created.

cosmicac jobs list

The job appears in the table with its ID, name, tags, and status.