workfromanywhereworkfromanywhere
All jobs
Bright Vision TechnologiesEngineering

Model Serving Engineer

Remote (Contiguous United States)Posted today

Bright Vision Technologies is seeking a Model Serving Engineer to design, build, and operate high-performance, reliable inference platforms for large machine learning models, focusing on systems engineering aspects of AI deployment.

Location: Remote (Contiguous United States)

Responsibilities

  • Design and operate model serving platforms supporting diverse workloads including LLMs, vision models, and recommendation systems.
  • Optimize inference performance using continuous batching, paged attention, speculative decoding, and request multiplexing.
  • Implement multi-tenant routing, rate limiting, and quality-of-service policies across model endpoints.
  • Build autoscaling and capacity management systems that balance latency, throughput, and cost.
  • Tune GPU utilization, memory management, and KV cache strategies for LLM serving workloads.
  • Integrate model serving with API gateways, identity systems, and observability platforms.
  • Implement caching, prompt deduplication, and response reuse strategies where appropriate.
  • Drive end-to-end observability including latency histograms, queue dynamics, GPU utilization, and error tracking.
  • Develop deployment workflows including canary releases, shadow testing, and automated rollback.
  • Operate incident response for high-availability AI services and drive durable reliability improvements.
  • Collaborate with ML and product teams to support new model releases and capability rollouts.
  • Implement security controls including request signing, content filtering, and abuse detection at the serving layer.
  • Document operational procedures, performance characteristics, and tuning guidance for internal teams.
  • Stay current with AI serving research and translate advances into production capabilities.

Requirements

  • Open-source contributions to model serving infrastructure.
  • Experience with multi-region or globally distributed AI serving.
  • Familiarity with model quantization, distillation, and compression techniques.
  • Exposure to FinOps for AI workloads and cost-efficient serving design.
  • Experience supporting external-facing AI APIs at scale.

Additional Information

  • Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies.
  • No new H1B sponsorship is available, but transfers are supported for qualified candidates.
  • A technical coding assessment is mandatory for applicants.
  • The role is a full-time, remote, in-house position with no third-party client or vendor involvement.

Location

Remote (Contiguous United States)

Category

Engineering

Source

himalayas

Posted

today

Similar remote jobs

Bureau VeritasNewEngineering

Principal Electrical Engineer Renewable Energy (MV-HV) - Remote

Remote, anywhere in the Americas with reasonable access for travel.$133,279.00-$199,919.00 Per Year
today
GuidehouseNewEngineering

Power Apps Developer

On-site as needed depending on client location, with ability to commute.$102,000.00-$170,000.00 per year
today
Drexel UniversityNewEngineering

DevOps Engineer

Remote$90,430.00 - $135,640.00 per year
today
Fluor CorporationNewEngineering

Assistant Project Manager - Environmental (Remote, CA, US)

California, USA$107,000.00 - $193,000.00
today