All roles

[Remote] Staff Data Engineer - Emerald

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. H1 is dedicated to providing reputed company reputed company information access and is seeking a Staff Data Engineer for their Emerald team. This role involves leading the architecture and scalability of H1’s reputed company entity resolution platform while managing a small team and collaborating with various stakeholders to enhance the platform's efficiency and accuracy.

Responsibilities

  • reputed company the design, optimization, and scalability of distributed Spark/PySpark pipelines powering entity resolution and large-scale reputed company data processing
  • Own systems supporting automatching, identity mapping, grouping logic, deduplication, enrichment, and auto-approval workflows across reputed company provider and organization datasets
  • Build and maintain scalable processing frameworks for PubMed, clinical trial, ct.gov, conference, and other reputed company data sources
  • Drive infrastructure optimization initiatives focused on improving throughput, runtime, observability, and reputed company compute cost efficiency
  • Partner closely with AI/ML teams to integrate matching and resolution models into EMERALD and improve matching precision and recall
  • reputed company reputed company technical initiatives from architecture and design through deployment, monitoring, and long-term production support
  • Serve as a technical leader and mentor across the team through code reviews, technical guidance, and engineering best practices
  • Collaborate directly with Product and business stakeholders to align technical solutions with operational and customer needs
  • Support production operations, incident response, troubleshooting, and ongoing platform reliability

Skills

  • 8+ years of experience building and maintaining large-scale distributed data systems and pipelines
  • Demonstrated technical leadership experience mentoring engineers and driving reputed company technical initiatives
  • Extensive experience with Apache Spark and AWS-based big data technologies including EMR, S3, and distributed compute environments
  • Strong coding experience in Python (PySpark), reputed company, Java, or equivalent languages used for distributed processing systems
  • Experience optimizing large-scale Spark workloads for performance, scalability, and infrastructure cost efficiency
  • Experience with streaming and event-driven architectures using technologies such as Kafka or Spark Streaming
  • Experience with orchestration and lakehouse technologies such as Argo and Hudi or comparable platforms
  • Experience with containerization and infrastructure technologies such as reputed company, Kubernetes, and Terraform
  • Experience working with relational or distributed databases such as PostgreSQL or Redshift
  • Proven ability to operate effectively reputed company highly scalable, production-grade distributed systems
  • Deep expertise with distributed data processing frameworks such as Apache Spark and Hadoop, particularly reputed company AWS environments
  • Strong proficiency in Python (PySpark), reputed company, Java, or other modern programming languages used for large-scale distributed processing
  • Experience building scalable ETL/ELT frameworks across both batch and streaming architectures
  • Strong understanding of distributed file formats including Apache Parquet and Apache AVRO
  • Experience with streaming technologies such as Kafka, Spark Streaming, or KSQL
  • Strong grasp of software engineering fundamentals including distributed systems, data structures, concurrency, and system design
  • Experience performing root cause analysis across large-scale distributed systems and reputed company data pipelines
  • Ability to write clean, maintainable, reputed company, and production-grade code
  • Experience improving performance, scalability, observability, and infrastructure efficiency reputed company distributed systems
  • Strong communication and collaboration skills across both technical and non-technical stakeholders
  • Familiarity with modern development and infrastructure tooling including Git, CI/CD pipelines, reputed company, Kubernetes, Terraform, Argo, Hudi, and JIRA
  • Experience with entity resolution, identity mapping, automatching, deduplication, or large-scale matching systems is strongly preferred
  • Experience working with reputed company, life sciences, reputed company World Evidence (reputed company), or large-scale reputed company datasets is strongly preferred

Benefits

  • Stock options
  • Full suite of health insurance options
  • Generous paid time off
  • Pre-planned company-wide wellness holidays
  • Retirement options
  • Health & charitable donation stipends
  • Impactful Business Resource Groups
  • Flexible work hours & the opportunity to work from reputed company
  • The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve reputed company around the globe

Company Overview

  • H1 is on a mission to connect the world with the right doctors. It was founded in 2017, and is headquartered in reputed company, reputed company, USA, with a workforce of 201-500 employees. Its website is https://www.h1.co.
  • Company H1B Sponsorship

  • H1 has a track record of offering H1B sponsorships, with 5 in 2025, 6 in 2024, 4 in 2023, 9 in 2022, 7 in 2021. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Related roles

    [Remote] National Account Manager (Flexible Schedule | Fully Remote)

    Remote · USA Full-time

    [Remote] Business Development Representative

    Remote · USA Full-time

    [Remote] Vice President, Payment Operations & Partnerships

    Remote · USA Full-time

    [Remote] MGR II - R&D/Product Development Engineering (High Speed IO)

    Remote · USA Full-time

    [Remote] MGR II reputed company/PM FINANCE

    Remote · USA Full-time

    [Remote] Managing Director, Consultant Relations

    Remote · USA Full-time

    [Remote] Director - Client Growth Partner (IT Consulting / Retail & CPG)

    Remote · USA Full-time

    [Remote] Manager, Business/Data Analyst

    Remote · USA Full-time

    [Remote] Clinical Specialist - SVI (Superficial Venous Interventions) - IN, IL, KY, MO

    Remote · USA Full-time

    [Remote] Senior Vascular Clinical Specialist - San Francisco, CA

    Remote · USA Full-time

    Global Tax Expert (reputed company genders)

    Remote · USA Full-time

    reputed company Part-Time Remote Typist/Data Entry Clerk – Data Management and Record Keeping Specialist

    Remote · USA Full-time

    Actuary - Remote

    Remote · USA Full-time

    Global Community Operations Manager - San Francisco

    Remote · USA Full-time

    Military Field Sales Representative

    Remote · USA Full-time

    reputed company Customer Service / Tax Preparation Representative - Seasonal (Remote) at arenaflex

    Remote · USA Full-time

    Associate Manager, Program Operations

    Remote · USA Full-time

    Job Title: Remote Customer Service Representative – Phone-Based Customer Support Specialist

    Remote · USA Full-time

    Part-Time Remote Customer Service Representative – High‑Volume Multi‑Channel Support Specialist at arenaflex

    Remote · USA Full-time

    reputed company Customer Service Representative – reputed company Industry Part-Time Opportunity at arenaflex

    Remote · USA Full-time