Projects — Srinivas Gampasani

Enterprise RAG System — AI knowledge retrieval interface

Generative AI Production

01

Enterprise AI Knowledge Retrieval (RAG)

Designed and deployed a production-grade Retrieval-Augmented Generation system at Ascension Via Christi Health, enabling clinical teams to query 100K+ enterprise documents in natural language with real-time, citation-grounded answers.

"Reduced average time-to-answer for clinical staff from 8 minutes to under 40 seconds — enabling faster care decisions at the bedside."

Ascension Via Christi Health Sep 2024 – Present HIPAA Compliant

35%Accuracy boost

<1sQuery latency

100K+Docs indexed

LangChainFAISSAzure OpenAIFastAPI DockerPineconePythonRedis

GitHub Discuss Project

Multi-Agent AI Orchestration — neural network visualization

Generative AI Agentic AI

02

Multi-Agent AI Orchestration Framework

Built a LangGraph-powered multi-agent orchestration system where specialized AI agents autonomously decompose complex clinical and enterprise tasks, delegate subtasks, use tools (web search, code execution, database queries), and produce verified outputs — mimicking a high-performing analyst team.

"5 specialized agents collaborating in real-time reduced task resolution time by 65% on complex multi-step workflows compared to single-agent approaches."

5 Specialized Agents LangGraph DAG Tool Use + RAG

5+AI agents

65%Faster resolution

∞Task types

LangGraphLangChainGPT-4FastAPI RedisDockerKubernetes

GitHub Discuss Project

LLM Fine-Tuning Pipeline — model training visualization

Generative AI Fine-Tuning

03

LLM Fine-Tuning Pipeline (LoRA / QLoRA)

Engineered a reusable end-to-end fine-tuning pipeline for domain adaptation of open-source LLMs (Llama 2, Mistral) using PEFT techniques. The pipeline handles data curation, quantization, LoRA adapter training, evaluation with ROUGE/BLEU, and automated publishing to HuggingFace Hub.

"Achieved 60% GPU memory reduction using 4-bit QLoRA quantization while maintaining 96% of full fine-tune benchmark scores — making domain LLMs accessible on a single A100."

QLoRA 4-bit Quantization Custom Domain Datasets ROUGE / BLEU Eval

60%Cost reduction

96%Benchmark score

4-bitQuantization

LoRAQLoRAPEFTPyTorch HuggingFaceLlama 2MistralW&B

GitHub Discuss Project

Healthcare AI Diagnostic Assistant — clinical AI interface

Generative AI Healthcare

04

Healthcare AI Diagnostic Assistant

Built a HIPAA-compliant conversational AI assistant for clinical teams at Ascension. Combines GPT-4 with a RAG layer over structured EHR data and clinical guidelines, enabling care teams to query patient history, surface relevant protocols, and generate draft clinical summaries in real time.

"Reduced documentation time for nursing staff by 28% during pilot — flagged for enterprise-wide rollout across 3 hospital networks."

HIPAA Compliant EHR Integration Clinical Workflows

28%Less doc time

3Hospital networks

99.9%Uptime SLA

GPT-4LangChainFAISSAzure FastAPIPostgreSQLHL7 FHIR

GitHub Discuss Project

Clinical NLP Document Intelligence — medical records processing

NLP Healthcare

05

Clinical NLP Document Intelligence

Engineered a HIPAA-compliant NLP pipeline that ingests unstructured clinical notes, discharge summaries, and radiology reports, then extracts structured entities (diagnoses, medications, procedures) and generates concise summaries. Deployed on Apache Spark for hospital-scale throughput.

"Processed 50,000+ patient documents in the first production run — reducing manual chart review effort by 30% across the care coordination team."

50K+ Records NER + Summarization Spark Distributed

30%Manual effort saved

50K+Docs processed

F1 0.91NER accuracy

BERTTransformersspaCyApache Spark PythonHugging FaceMLflow

GitHub Discuss Project

Semantic Search Engine — vector search visualization

NLP Vector Search

06

Hybrid Semantic Search Engine

Built a production semantic search platform combining dense bi-encoder embeddings (Sentence Transformers) with BM25 sparse retrieval using Reciprocal Rank Fusion. Deployed on Pinecone and Weaviate with an A/B testing framework that continuously improves ranking models using implicit user feedback signals.

"Improved search relevance MRR@10 from 0.54 to 0.81 — a 50% uplift — by combining dense + sparse signals with cross-encoder reranking."

Hybrid Dense + BM25 A/B Testing <100ms Latency

50%MRR uplift

<100msP99 latency

1M+Vectors indexed

Sentence TransformersPineconeWeaviate FastAPIBM25Cross-Encoder

GitHub Discuss Project

Knowledge Graph AI System — graph database visualization

NLP Graph AI

07

Knowledge Graph AI System

Designed an AI-powered knowledge graph platform that ingests documents, extracts entities and relationships using BERT-based NER, builds a Neo4j graph, and exposes a GraphQL API enabling LLMs to reason over structured enterprise knowledge — bridging unstructured text and graph traversal.

"Enabled analysts to answer multi-hop questions across 200K+ entity relationships that previously required 3 separate database queries and manual joining."

Neo4j Graph DB Entity Linking GraphQL API

200K+Entities mapped

F1 0.88Relation extract

3xQuery speed

Neo4jLlamaIndexNERBERT GraphQLPythonspaCy

GitHub Discuss Project

Demand Forecasting ML System — analytics dashboard

Machine Learning Colgate-Palmolive

08

Demand Forecasting ML System

Built an enterprise demand forecasting platform for Colgate-Palmolive covering 5,000+ SKUs across 12 global markets. Ensemble of XGBoost, LightGBM, and LSTM with automated feature engineering (lag features, rolling statistics, calendar events) and Airflow-orchestrated weekly retraining cycles.

"20% improvement in forecast MAPE over the legacy statistical model — translating to ~$1.4M in reduced overstock and stockout costs annually."

Colgate-Palmolive 5K+ SKUs 12 Markets

20%MAPE reduction

$1.4MAnnual savings

5K+SKUs covered

XGBoostLightGBMTensorFlowAirflow MLflowPySparkSnowflake

GitHub Discuss Project

Real-Time Anomaly Detection — monitoring dashboard

Machine Learning Real-Time

09

Real-Time Anomaly Detection System

Engineered a streaming anomaly detection platform processing Kafka event streams at 50K events/sec. Ensemble of Isolation Forest, Autoencoder, and statistical control charts with adaptive thresholds — triggers automated Slack/PagerDuty alerts and writes flagged events to a Delta Lake audit table.

"Caught a data pipeline failure within 4 seconds of onset — previously undetected failures caused 6-hour data gaps impacting downstream analytics."

50K Events/sec PagerDuty Alerts Adaptive Thresholds

95%+Precision

50K/sEvent throughput

4sDetection lag

KafkaIsolation ForestPyTorch Autoencoder Scikit-learnFastAPIDelta LakeDocker

GitHub Discuss Project

Computer Vision OCR Pipeline — document scanning AI

Machine Learning Computer Vision

10

Computer Vision OCR Pipeline

Developed an automated document digitization system combining Vision Transformers (ViT) for document layout understanding with Tesseract + PaddleOCR for character recognition. Handles handwritten forms, multi-column medical PDFs, and low-quality scans — extracting structured JSON output for downstream systems.

"Reduced manual data entry for clinical forms from 4 minutes per document to 8 seconds — enabling same-day processing of incoming patient paperwork."

Multi-format Docs Handwriting Support JSON Output API

98%OCR accuracy

30xFaster than manual

8sPer document

ViTCLIPOpenCVTesseract PaddleOCRPyTorchFastAPI

GitHub Discuss Project

End-to-End MLOps Pipeline — Kubernetes deployment infrastructure

MLOps Zero Downtime

11

End-to-End MLOps Platform

Architected a complete ML lifecycle management platform — from experiment tracking (MLflow) and model registry to canary deployments on Kubernetes, automated drift detection (evidently.ai), and a Grafana observability stack. Standardized the path from notebook to production across 3 engineering teams.

"Reduced model deployment time from 3 days to 45 minutes — with 30% fewer post-deployment incidents thanks to automated drift alerting."

Kubernetes Canary Drift Detection 3 Teams Onboarded

45minDeploy time

30%Fewer incidents

100%Zero-downtime

MLflowKubernetesPrometheusGrafana CI/CDJenkinsevidently.aiDocker

GitHub Discuss Project

Real-Time Streaming Prediction System — server infrastructure

MLOps AWS

12

Real-Time Streaming Prediction System

Built an event-driven ML inference platform on AWS where Kafka topics trigger real-time model scoring via SageMaker endpoints. Features horizontal autoscaling (response to traffic spikes in <90s), live Grafana dashboards, and a shadow-mode framework for safely validating new model versions before full cutover.

"System handled 3x traffic spike during product launch (Black Friday equivalent) with zero degradation — inference latency stayed under 85ms at peak."

<85ms Inference Auto-scaling <90s AWS SageMaker

<85msP99 latency

3xTraffic spike handled

99.95%Availability

KafkaTensorFlowAWS SageMakerDatabricks GrafanaTerraformDocker

GitHub Discuss Project

Scalable ETL Data Platform — data pipeline architecture

MLOps Data Engineering

13

Scalable Clinical Data ETL Platform

Designed a fault-tolerant ETL platform ingesting data from 8 source systems (EHR, lab systems, billing, scheduling) into a unified Snowflake data warehouse. Built with dbt for transformation lineage, Apache Airflow for orchestration, and great_expectations for automated data quality gates — serving as the foundation for all downstream ML models.

"Reduced data pipeline latency from 6 hours to 22 minutes — enabling near-real-time clinical dashboards that inform daily bed management decisions."

8 Source Systems Data Quality Gates 22min Latency

94%Latency reduction

8Source systems

99.8%Data quality score

Apache SparkAirflowdbtSnowflake Delta Lakegreat_expectationsPython

GitHub Discuss Project

All Projects

Enterprise AI Knowledge Retrieval (RAG)

Multi-Agent AI Orchestration Framework

LLM Fine-Tuning Pipeline (LoRA / QLoRA)

Healthcare AI Diagnostic Assistant

Clinical NLP Document Intelligence

Hybrid Semantic Search Engine

Knowledge Graph AI System

Demand Forecasting ML System

Real-Time Anomaly Detection System

Computer Vision OCR Pipeline

End-to-End MLOps Platform

Real-Time Streaming Prediction System

Scalable Clinical Data ETL Platform

Interested in my work?