KnowUrMedicine
In a Mumbai household, 17-year-old Som Sengupta watched his family’s house help bring him an entire medicine box, saying, “Aap hi lelo, Som,” unable to parse the English labels to pick the right one. This moment, two years ago, sparked KnowUrMedicine, a web app designed to make medicine information accessible to India’s vernacular-speaking and less literate populations.
With 85% of Indians lacking English proficiency and 90% of medicine labels in English, the stakes are high: a study shows 45% of patients misinterpret dosage or frequency without native-language instructions, risking health hazards. Som’s tech-driven solution leverages AI, optical character recognition (OCR), and text-to-speech (TTS) to deliver critical medicine details in languages like Hindi and Marathi, all through a scalable web platform. This 1000-word feature dives into the technical ingenuity behind Som’s altruistic innovation.
Som’s epiphany came from a real-world problem. “Not everyone has the privilege of reading English like I do,” he says. The struggle he saw wasn’t unique, millions in India face similar barriers, leading to medication errors. Determined to help, Som envisioned a tool that translates and vocalizes medicine details, prioritizing accessibility for the less literate.
Initially, he considered a physical device for chemist shops to scan and read labels aloud. “But it wasn’t scalable or cost-effective,” he realized after prototyping. A native mobile app was also nixed due to storage constraints on low-end smartphones common in India. Instead, he pivoted to a web app, capitalizing on near-universal browser access. “Everyone has a smartphone with a browser,” Som notes, ensuring KnowUrMedicine reaches the masses without installation barriers.
Also read: Microsoft’s medical AI system is four times more accurate than human doctors: Here’s how
The app’s architecture is a testament to lean, effective engineering. The frontend, built with JavaScript and CSS, delivers a clean, intuitive interface optimized for mobile browsers. “I wanted to keep it simple, scan a QR code, select a language, and get the info,” Som explains. The backend uses a lightweight framework to handle processing efficiently. Som avoided complex frameworks, opting for a simpler ChatGPT 4o based solution to streamline development. The app operates entirely in the cloud, avoiding local storage to maintain user privacy and scalability. “No user data is stored,” Som emphasizes, aligning with his altruistic vision.
The core functionality hinges on three technologies: OCR to extract text from medicine labels, AI for translation, and TTS for audio output. Users access the app via a QR code or link, choose from Hindi, or Marathi (with plans to add more languages), and upload a photo of the medicine packaging. The app processes the image, extracts details like dosage and side effects, and plays them aloud automatically. This pipeline, powered by ChatGPT’s API, required overcoming significant technical challenges.
The Optical Character Recognition (OCR) system, driven by ChatGPT’s API, is the app’s backbone and Som’s proudest achievement. “Getting the OCR right felt like a breakthrough,” he says. He fine-tuned ChatGPT with precise prompts to ensure it recognizes only medicine packaging, rejecting irrelevant inputs like a talcum powder box. “The AI is prompted to accept only actual medicines,” Som explains, minimizing false positives – a critical feature in a healthcare app where errors can be costly.
Early development saw issues with inconsistent text extraction, especially on poorly lit or crumpled labels. “Cracking that was tough,” he admits. Without using retrieval-augmented generation (RAG) or custom image datasets, Som relied on prompt engineering to filter out “garbage” inputs. “I didn’t feed it RAG images; it’s just the ChatGPT API with carefully crafted prompts,” he says. Iterative testing and refinement honed the system’s accuracy, validated through field tests with over 200 customers. Chemists noted it saved time and reduced confusion, proving the OCR’s reliability.
Also read: Early reactions to ChatGPT-5 are all bad: What went wrong for OpenAI?
The TTS module, which reads medicine details aloud, was another hurdle. Early versions sounded robotic, particularly in Hindi and Marathi. “The audio was hard to understand,” Som admits. Using a modified version of Google Voice, he iterated to achieve a conversational tone, essential for users with low literacy. “We didn’t want it to feel like a machine talking,” he says.
A key insight from field tests reshaped the user experience: an initial design required users to press a button for audio playback, but chemists reported customers found it cumbersome. “We saw people struggling to scroll and press,” Som recalls. He eliminated the button, making audio autoplay, a small but impactful tweak that enhanced accessibility. This user-centric approach underscores the app’s mission to serve everyone, regardless of tech savviness.
Translation is handled by an AI model, likely accessed via API, which converts English medicine details into Hindi and Marathi. Som avoided building a proprietary medical lexicon, opting for an off-the-shelf AI translator to keep development lean. “The AI handles the heavy lifting,” he says, ensuring accurate translations of technical terms like dosage instructions. Early mistranslations were a challenge, but refining the AI prompts improved precision. The app cross-references extracted data with online sources to verify details, ensuring reliability. This cloud-based approach keeps costs low and scalability high, critical for Som’s vision of nationwide adoption.
Development wasn’t without roadblocks. Beyond OCR and TTS issues, Som faced the learning curve of mastering FastAPI and AI integration. “The frameworks took time to understand,” he says, crediting his mentor, Sandeep Mistry, for guidance on OCR and pharma outreach. Field tests revealed user interface quirks, like the audio button issue, which Som addressed through rapid iteration. His collaboration with 20+ healthcare professionals and 3-4 chemists provided real-world feedback, ensuring the app met practical needs. “They said it helped customers and saved time,” he notes, a validation of his technical choices.
Som designed KnowUrMedicine for scale. The cloud-based architecture handles growing user loads efficiently, and the QR code model – envisioned on medicine bags or at chemist shops – ensures easy access. “We want it everywhere, like on pharmacy bags,” he says.
Som’s app is more than a technical feat, it’s a lifeline for India’s underserved. By focusing on information, not prescriptions, it sidesteps regulatory complexities while empowering users. The OCR and TTS breakthroughs, paired with a scalable web platform, make KnowUrMedicine a model of impactful tech. “I just want everyone to have access to medical information,” Som says. His father’s support in finding a mentor and his own perseverance turned a teenager’s insight into a tool with nationwide potential. As Som scales KnowUrMedicine, his story proves that empathy, paired with technical ingenuity, can transform lives, one scanned label at a time.
Also read: Eleven Music: AI music generator for all, with no fear of copyright strikes