All jobs
Capstone Integrated Solutions
Senior Data Engineer (AWS)
RemotePosted 4 days ago
Capnexus is seeking a highly skilled Senior AWS Data Engineer to lead data architecture, pipeline development, and data integrations, leveraging advanced cloud data engineering skills on a platform that uses generative AI to automate and modernize enterprise workflows.
Location: Remote
Responsibilities
- Participate in data discovery workshops to inventory source systems including property management platforms, marketing channels, and CRM data, and translate findings into data lake architecture requirements.
- Design and implement a multi-zone enterprise data lake on Amazon S3 (raw, conformed, enriched, aggregated) with ingest, cleansing, and business layers including schema versioning, checksum validation, business rule validation, and quarantine/notify workflows on failure.
- Build batch and streaming data ingestion pipelines using AWS Glue, Amazon Kinesis, and containerized ingestion applications across CDP, marketing, and property management data sources.
- Write PySpark and Python ETL code for AWS Glue jobs to transform, cleanse, and enrich data at scale; apply Apache Iceberg table format for ACID-compliant, schema-evolving data lake tables.
- Implement data transformation and orchestration frameworks using AWS Glue ETL and AWS Step Functions; configure AWS Glue Data Catalog with crawlers for automated metadata management and discovery.
- Implement AWS Lake Formation for fine-grained data governance including table-level and column-level permissions, data filters, and resource links.
- Configure Amazon Athena for serverless SQL querying across the data lake with performance optimization; implement Amazon DynamoDB for sub-second customer profile lookups, with DAX where latency requirements demand it.
- Develop and deploy AWS Lambda functions using AWS Lambda Powertools for structured logging, handler routing, and observability; implement error handling patterns including exponential backoff, retries, dead-letter queues, and CloudWatch alarms.
- Write and maintain Terraform (or CloudFormation/CDK) modules to provision and deploy AWS data infrastructure as part of the CI/CD pipeline.
- Integrate CI/CD pipelines using GitHub Actions for automated deployment of Glue jobs, Lambda functions, and Step Functions workflows with lint checks and validation gates.
- Support Azure Data Lake migration: conduct discovery of ADLS assets, schemas, and transformation logic; provision AWS target environments; execute migration via AWS DataSync; perform row-count reconciliation, schema validation, and checksum comparison post-migration.
- Design and implement entity resolution pipelines to identify, deduplicate, and merge customer records into unified golden records using deterministic and fuzzy matching with lineage tracking and manual review pathways.
- Build and maintain data models to support Customer 360 views and executive analytics dashboards via Amazon QuickSight.
- Ensure data quality, validation, and integrity across all pipeline stages; support UAT for data-dependent features.
- Collaborate with Full Stack, DevOps/MLOps, and AI/ML team members working with Bedrock and SageMaker; contribute to architecture documentation, pipeline runbooks, and data governance documentation.
Requirements
- 5+ years of hands-on data engineering experience with at least 2+ years in AWS cloud environments.
- Strong proficiency in Python and SQL; hands-on PySpark or Scala coding experience for AWS Glue ETL — this is a coding role, not a configuration role.
- Hands-on experience with AWS Glue (jobs, crawlers, Data Catalog), AWS Step Functions, AWS Lambda, and Amazon S3 data lake architecture.
- Proficiency with AWS Lambda Powertools for structured logging, handler management, and observability in production serverless workloads.
- Working knowledge of Apache Iceberg table format including schema evolution, time travel, and partition management.
- Hands-on experience with Terraform, AWS CloudFormation, or AWS CDK for infrastructure as code integrated into CI/CD pipelines — candidates who have only consumed pre-made DevOps templates will not meet this requirement.
- Experience with AWS Lake Formation for fine-grained access control including table-level and column-level permissions, data filters, and resource links.
- Solid understanding of DynamoDB data modeling and key design patterns for sub-second lookups; familiarity with DAX for caching.
- Experience with Amazon Athena performance tuning: file formats, partitioning strategies, query optimization, and understanding of when Athena is and is not the right tool.
- Experience with GitHub Actions or comparable CI/CD tooling for automated deployment of data pipeline code.
- Strong understanding of data quality patterns: schema validation, checksum validation, business rule validation, quarantine workflows, and lineage tracking.
- Strong analytical, problem-solving, and communication skills; comfortable working in Agile/Scrum teams alongside AWS Professional Services.
Benefits
- Remote work
Similar remote jobs
3d ago
SamsaraData
Senior Marketing Analytics Manager - BI and Data Architecture
Remote (US)$119,000—$180,000 USD per year
14d ago