All jobs
HyredData
Data Engineer (Data Pipelines & RAG)
RemotePosted today
A versatile Data & AI Engineer role at a fast-growing Property Tech AI company, focusing on building and maintaining data pipelines for Gen AI applications, with responsibilities spanning data modeling, AI integration, observability, and automation.
Location: Remote
Responsibilities
- Automate data ingestion from diverse sources including unstructured documents, tables, charts, and drawings.
- Own chunking strategy, embedding, indexing of data for retrieval by RAG/agent systems.
- Build, test, and maintain robust ETL/ELT workflows using Spark (batch & streaming).
- Define and implement logical/physical data models and schemas, develop schema mapping and data dictionaries.
- Instrument data pipelines to surface real-time context into LLM prompts.
- Implement prompt engineering and RAG for workflows within the RE/Construction industry vertical.
- Implement monitoring, alerting, and logging for data quality, latency, and errors.
- Apply access controls and data privacy safeguards (e.g., Unity Catalog, IAM).
- Develop automated testing, versioning, and deployment using Azure DevOps, GitHub Actions, Prefect/Airflow.
- Maintain reproducible environments with infrastructure as code (Terraform, ARM templates).
Requirements
- 5 years in Data Engineering or similar role, with 12-24 months experience in building pipelines for unstructured data extraction, document processing with OCR, cloud-native solutions, chunking, indexing for RAG/Gen AI applications.
- Proficiency in Python, dlt for ETL/ELT pipelines, duckDB or equivalent tools, dvc for large file management.
- Solid SQL skills and experience with relational databases; familiarity with non-relational column-based databases.
- Familiarity with Prefect or similar tools (Azure Data Factory).
- Proficiency with Azure ecosystem and services in production.
- Familiarity with RAG indexing, chunking, and storage across file types.
- Strong DevOps and CI/CD experience (CircleCI / Azure DevOps).
- Experience deploying ML artifacts using MLflow, Docker, or Kubernetes.
Benefits
- Fast-growing, revenue-generating proptech startup.
- Flat, no BS environment with high autonomy.
- Steep learning opportunities in enterprise production use-cases.
- Remote work with quarterly meet-ups.
- Exposure to multi-market, multi-cultural clients.
Additional Information
- Early-stage startup environment requiring wearing many hats, working outside comfort zone, with direct impact in production.