workfromanywhereworkfromanywhere
All jobs
HyredData

Data Engineer (Data Pipelines & RAG)

RemotePosted today

A versatile Data & AI Engineer role at a fast-growing Property Tech AI company, focusing on building and maintaining data pipelines for Gen AI applications, with responsibilities spanning data modeling, AI integration, observability, and automation.

Location: Remote

Responsibilities

  • Automate data ingestion from diverse sources including unstructured documents, tables, charts, and drawings.
  • Own chunking strategy, embedding, indexing of data for retrieval by RAG/agent systems.
  • Build, test, and maintain robust ETL/ELT workflows using Spark (batch & streaming).
  • Define and implement logical/physical data models and schemas, develop schema mapping and data dictionaries.
  • Instrument data pipelines to surface real-time context into LLM prompts.
  • Implement prompt engineering and RAG for workflows within the RE/Construction industry vertical.
  • Implement monitoring, alerting, and logging for data quality, latency, and errors.
  • Apply access controls and data privacy safeguards (e.g., Unity Catalog, IAM).
  • Develop automated testing, versioning, and deployment using Azure DevOps, GitHub Actions, Prefect/Airflow.
  • Maintain reproducible environments with infrastructure as code (Terraform, ARM templates).

Requirements

  • 5 years in Data Engineering or similar role, with 12-24 months experience in building pipelines for unstructured data extraction, document processing with OCR, cloud-native solutions, chunking, indexing for RAG/Gen AI applications.
  • Proficiency in Python, dlt for ETL/ELT pipelines, duckDB or equivalent tools, dvc for large file management.
  • Solid SQL skills and experience with relational databases; familiarity with non-relational column-based databases.
  • Familiarity with Prefect or similar tools (Azure Data Factory).
  • Proficiency with Azure ecosystem and services in production.
  • Familiarity with RAG indexing, chunking, and storage across file types.
  • Strong DevOps and CI/CD experience (CircleCI / Azure DevOps).
  • Experience deploying ML artifacts using MLflow, Docker, or Kubernetes.

Benefits

  • Fast-growing, revenue-generating proptech startup.
  • Flat, no BS environment with high autonomy.
  • Steep learning opportunities in enterprise production use-cases.
  • Remote work with quarterly meet-ups.
  • Exposure to multi-market, multi-cultural clients.

Additional Information

  • Early-stage startup environment requiring wearing many hats, working outside comfort zone, with direct impact in production.

Location

Remote

Category

Data

Company

Hyred

Source

himalayas

Posted

today

Share this job

XLinkedIn

Similar remote jobs

LifelancerNewData

Advanced Analytics Manager

Hyderabad, Telangana, India
today
PBGData

Capital Planning Analyst

Remote
yesterday
AWISEEData

Junior Data Analyst Scientist

Remote
yesterday

AI ML Associate

Remote, Australia
2d ago