Hello!

I am a
Research Assistant
at the University of Galway, Ireland, affiliated with the Insight Research Ireland Centre for Data Analytics. Under the guidance of Prof. Paul Buitelaar in the
Natural Language Processing Unit
, My research focuses on
Factual Story Visualization
and
Culture Adapation in Multilingaul Text-to-Image models.


My
research interests
spans across Large Vision-Language Models (LVLMs), Large Language Models (LLMs), Natural Language Processing (NLP) and Computer Vision, with a focus on solving domain-specific challenges. I am passionate about leveraging these technologies to drive innovation in both industry and academia.

Outside of my professional pursuits, I enjoy playing
Cricket
and
Chess
, as well as
Traveling with Friends
. These activities not only provide relaxation but also inspire creativity and strategic thinking, which I bring into my research work.

To know more, refer to my resume or drop me an email!

News and Updates

  • Dec 2025: Our work on Culture Evaluation in Multilingual Story Visualization is now available on
    Arxiv.
  • Oct 2025: Our work on Scientific VQA published in
    ACM Multimedia 2025 LAVA Workshop.
  • Sep 2025: Our work on Live Video Comment Generation Published in
    IEEE Transaction on Multimedia.
  • June 2025: Arxiv preprint of our recent work on
    Spiritual-LLM
    is available on Arxiv.
  • June 2025: I will be attending a SIGIR 2025 Conference in Padova, Italy, from 12-18 July 2025.
  • April 2025: Our paper on Example Selection for In-Context Learning got accepted at SIGIR 2025.
  • March 2025: Our paper on Improving Story Narration got accepted at ECIR Text2Story Workshop 2025.
  • Feb 2025: Our paper on Live Video Comment Generation got accepted at Multimedia Transaction 2025.
  • December 2024: Our paper on Scientific Visual Question Answering got accepted at AAAI Workshop 2025.
  • September 2024: Started as Research Assistant at Insight Research Center for Data Analytics, Galway.
  • June 2024: Completed Post Gradaution in M.Tech CSE with Specialization in AI from IIIT Delhi.
  • February 2024: Our paper on Named Entity Recognition on Recipes got accepted at LREC-COLING 2024.
  • February 2024: Started Research Internship at National Insititue of Informatics (NII) Japan .
  • January 2024: Our paper on Multimodal Physics Question Answering got accepted at PAKDD 2024.
  • September 2023: Selected for prestegious ARTPARK PG Research Fellowship from IISC Bengaluru.
  • September 2022: Started my M.Tech CSE with Specialization in AI at IIIT Delhi.

Publications

A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization
Janak Kapuriya, Ali Hatami, Paul Buitelaar
Arxiv Preprint 2025
paper

Enhancing Scientific Visual Question Answering via Vision-Caption aware Supervised Fine-Tuning
Janak Kapuriya, Anwar Dilawar Shaikh, Arnav Goel, Medha Hira, Apoorv Singh, Jay Saraf, Sanjana Sanjeev, Vaibhav Nauriyal, Avinash Anand, Zhengkui Wang, Rajiv Ratn Shah
LAVA @ ACM Multimedia 2025
paper

Semantic Frame Aggregation-based Transformer for Live Video Comment Generation
Anam Fatima, Yi Yu, Janak Kapuriya, Julien Lalanne, Jainendra Shukla
IEEE Transaction on Multimedia 2025
paper

Exploring the Role of Diversity in Example Selection for In-Context Learning
Janak Kapuriya, Manit Kaushik, Debasis Ganguly, Sumit Bhatia
SIGIR 2025 | Special Interest Group on Information Retrieval
paper

FlintstonesSV++ : Improving Story Narration using Visual Scene Graph
Janak Kapuriya, Paul Buitelaar
Text2Story @ ECIR 2025 | European Conference on Information Retrieval
paper

Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs
Janak Kapuriya, Aman Singh, Jainendra Shukla, Rajiv Ratn Shah
Arxiv Preprint 2025 | Under Review
paper

Optimizing Multimodal Large Language Models for Scientific VQA through Caption-Aware Supervised Training
Janak Kapuriya, Arnav Goel, Medha Hira, Apoorv Singh, Naman Lal, Jay Saraf, Sanjana Sanjeev, Vaibhav Nauriyal, Avinash Anand, Rajiv Ratn Shah
AI4Edu @ AAAI 2025 | Association for the Advancement in the Artificial Intelligence (AAAI)
paper

MM-PhyQA: Multimodal Physics Question-Answering with Multi-image CoT Prompting
Avinash Anand Janak Kapuriya, Apoorv Singh, Jay Saraf, Naman Lal, Astha Verma, Rushali Gupta & Rajiv Shah
PAKDD 2024 | Pacific-Asia Conference on Knowledge Discovery and Data Mining
paper

Deep Learning Based Named Entity Recognition Models for Recipes
Mansi Goel*, Ayush Agarwal*, Shubham Agrawal*, Janak Kapuriya*, Akhil Vamshi Konam*, Rishabh Gupta, Shrey Rastogi, Niharika Niharika, Ganesh Bagler | (*Equal Contribution)
LREC-COLING 2024 | Joint Int. Conference on Computational Linguistics, Language Resources and Evaluation
paper


Teaching

  • Winter 2024: Teaching Assistant for CSE508: Information Retrieval (IIIT-Delhi)
  • Monsoon 2022: Teaching Assistant for CSE201: Advance Programming (IIIT-Delhi)

  Template: Ashish Sharma