Senior Software Engineer – Retrieval-Augmented Generation (RAG) System
We are seeking an engineer to work with a team to build and support a healthcare centered production-scale RAG system that combines document retrieval with response generation to deliver accurate, context-aware answers. This engineer we be expected to design, implement, and operate end-to-end RAG pipelines— LLM interaction, API creation, and high-performance, secure delivery of knowledge-grounded capabilities. You will collaborate with data engineers, platform teams, and product partners to ship reliable, scalable, and observable systems.
Role and responsibilities
- Architect, implement, test, and operate end-to-end RAG workflows:
- Ingest and normalize documents from diverse sources
- Generate and manage embeddings; index and query vector databases
- Retrieve relevant passages, apply reranking or fusion strategies, and feed prompts to LLMs
- Build scalable, low-latency services and APIs (Python preferred; other languages acceptable) and ensure production-grade reliability (monitoring, tracing, alerting)
- Integrate with vector databases and embedding pipelines and optimize for latency, throughput, and cost
- Design and implement ML Ops workflows: model/version management, experiments, feature stores, CI/CD for ML-enabled services, rollback plans
- Develop robust data pipelines and governance around ingestion, provenance, quality checks, and access controls
- Collaborate with data engineers to improve retrieval quality (embedding strategies, reranking, cross-encoder models, prompt engineering) and implement evaluation metrics (precision/recall, MRR, QA accuracy, user-centric metrics)
- Implement monitoring and observability for RAG components (latency, success rate, cache hit rate, retrieval quality, data drift)
- Ensure security, privacy, and compliance (authentication, authorization, data masking, PII handling, audit logging)
- Optimize for scalability and reliability in cloud environments (AWS/GCP/Azure) and containerized deployments (Docker, Kubernetes)
- Contribute to architecture decisions, drive technical debt reduction, and mentor junior engineers
- Collaborate with product, design, and data teams to translate requirements into robust software solutions
- Document APIs, runbooks, and architectural decisions; participate in code reviews and design reviews
Required qualifications
- 5+ years of professional software engineering experience designing and delivering production systems
- Strong programming skills (Python required; NodeJs a plus)
- Deep understanding of retrieval-augmented or application-scale NLP systems and practical experience building RAG-like pipelines
- Hands-on experience with ML workflow tooling and MLOps concepts (model serving, versioning, experiments, feature stores, reproducibility)
- Proficiency with cloud infrastructure and modern software practices (AWS/GCP/Azure; Docker; Kubernetes; CI/CD)
- Strong problem-solving skills, excellent communication, and ability to work with cross-functional teams
- Familiarity with data governance, privacy, and security best practices
Preferred qualifications
- Experience with agentic workflow tools (LangGraph) and familiarity with prompt engineering for LLMs
- Exposure to working with and evaluating different LLMs
- Knowledge of evaluation methodologies for retrieval and QA systems and the ability to set up A/B tests and dashboards
- Experience with data processing frameworks (SQL, Pandas, Spark) and working with large-scale data pipelines
- Background in performance optimization for low-latency AI services (MLflow)
- Experience with monitoring and logging via New Relic, K9s, Portkey, etc
- Experience with minimizing token usage and cost optimization
- Comfortable with design and implementation of security controls for data-intensive AI systems