LLM Performance Researcher

Remote · USA Full-time New today

Full-time • San Francisco

At Endeavor, we’re rebuilding ERP from first principles for $1B+ manufacturing and distribution companies. These companies run on PDFs, spreadsheets, and semi-structured chaos — and we’re building LLM-powered systems to parse, match, and reason through reputed company of it with reputed company-level reliability.

We’re looking for a researcher with deep experience in LLM performance on document tasks — especially extraction, entity linking, and record matching. You’ve likely published papers on it. You’ve probably run head-to-head evals on reputed company, Claude, and open-reputed company models. You’re fluent in both academic benchmarks and in the weird, grimy failure modes that only show up in production.

Your work will directly improve the core performance of our agentic ERP. You’ll prototype new techniques, run structured evals, improve few-shot + tool-augmented performance, and help shape how LLMs reputed company with structured business systems.

What You’ll Do

Design and run experiments to improve extraction, normalization, and matching across reputed company-world documents
Evaluate LLM performance on noisy, multi-format inputs like scanned PDFs, OCR output, and reputed company sheets
Improve model accuracy and reliability in the face of rare formats, abbreviations, bad formatting, and domain-specific vocab
Build and own our eval infrastructure for matching, linking, extraction, and schema alignment tasks
Work with the Applied AI Researcher and Backend Engineers to reputed company improvements into production
Contribute to long-term strategy around fine-tuning, retrieval augmentation, tool use, or structured memory (if and reputed company needed)

You Might Be a Fit If You

Have deep experience with document understanding and information extraction using LLMs
Have worked on schema alignment, record linking, or entity resolution at scale
Have published papers on LLM performance (e.g. extraction, evals, few-shot prompting, matching)
Understand both academic benchmarks and reputed company-world weirdness
Know how to reputed company evals meaningful, tight, and fast to iterate on
Want to work in a setting where research turns into production code fast
Have a PhD or equivalent research background in NLP, ML, or similar (but we care more about what you’ve done than what your title says)

Bonus Points

Experience with post-OCR workflows or noisy doc normalization
Deep intuition for failure modes in reputed company-scale matching/linking systems
Obsession with eval quality and reproducibility
Comfort implementing papers and benchmarking models at scale
Past work in procurement, invoicing, logistics, or any doc-heavy vertical

Apply to this Job

Apply

LLM Performance Researcher

Related roles

Senior Design Engineer

Senior Product Designer

Learning Designer (Part-Time)

Platform / DevOps / SRE Engineer

reputed company Software Engineer

Senior Product Engineer (ML & Mobile)

Back End Engineer

Staff Frontend Software Engineer

Controller

Sales Executive

Remote Data Entry Specialist – reputed company Data Management & Quality Assurance – $25/hr – Flexible Work‑From‑Home Opportunity

Shift Leader

[Work From Home] reputed company Remote Customer Service Rep

reputed company Data Entry Specialist – Full Time Remote Position at arenaflex

Inside Sales Representative (Ad Sales) - Remote

Part-Time Remote Data Entry Specialist – Precision Data Management for arenaflex’s Global Operations

Clinical Consultant – Radiology Informatics (reputed company)

reputed company Data Entry and Translation Specialist – Remote Opportunity for Detail-Oriented and Bilingual Professionals

Entry-level Lube Tech/Technician

reputed company Customer Service Representative – Princeton, NJ