Kamrul Hasan

PhD Candidate @ University of Rochester

I am a PhD candidate at the Univcersity of Rochester, USA. I received my MS degree in Computer Science from the University of Rochester in 2018 and B.Sc. in Computer Science and Engineering from Bangladesh University of Engineering and Technology in 2014.

My research focused on theoretical and empirical aspects of multimodal machine learning. It is concerned with joint modeling of multiple modalities (language, acoustic & visual) to understand human language as it happens face-to-face communications. My research work spans the following areas: multimodal representation learning, humor, sentiment, argument & credibility understanding in video utterance. My supervisor is Ehsan Hoque.

Resume

News

[2021] EMNLP: one paper got accepted - multimodal argument analysis.

[2021] One paper got accepted in IEEE Transactions on Affective Computing.

[2020] AAAI: one paper got accepted - multimodal humor understanding.

[2020] ACL: one paper got accepetd - Multimodal Finetuning of Advanced Transformer Models.

[2020] Passed thesis proposal "Multimodal Represenation Learning for Human Behavior Understanding".

[2019] EMNLP: one paper got accepted - UR-FUNNY dataset.

Experience

	Research Intern Comcast AI, Washington DC Jul - Sep 2020. Advisor: Mahmudul Hasan, Project: unsupervised text segmentation.
	Applied Scientist Intern Amazon, Sunnyvale, CA May - Aug 2019. Advisor: Yelin Kim, Project: multimodal emotion recognition from noisy data.
	Software Engineer Therap (BD) Ltd, Bangladesh Aug 2014 - May 2015. Project: Worked with Java and J2EE technologies. Desinged web solutions based on Spring framework and Hibernate ORM tool.

Publications

Highlighted Papers

	Hitting your MARQ: Multimodal ARgument Quality Assessment in Long Debate Video Md Kamrul Hasan, James Spann, Masum Hasan, Md Saiful Islam, Kurtis Haut, Rada Mihalcea, and Ehsan Hoque EMNLP 2021 [PDF] Present the first comprehensive study on multimodal argument quality assessment.
	Humor Knowledge Enriched Transformer for Understanding Multimodal Humor Md Kamrul Hasan, Sangwu Lee, Wasifur Rahman, Amir Zadeh, Rada Mihalcea, Louis-Philippe Morency, and Ehsan Hoque AAAI 2021 [PDF] [Code] Can machine recognise humorous punchline given the context story and all three modalities?
	Integrating Multimodal Information in Large Pretrained Transformers Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. ACL 2020 [PDF] [Code] Can large pre-trained language model integrate non-verbal features?
	UR-FUNNY: A Multimodal Language Dataset for Understanding Humor Md Kamrul Hasan, Wasifur Rahman, Amir Zadeh, Jianyuan Zhong, Md Iftekhar Tanveer,and Louis-Philippe Morency EMNLP 2019 [PDF] [Code] [Data] Proposed first multimodal dataset and baseline model C-MFN for humor detection.
	Facial Expression Based Imagination Index and a Transfer Learning Approach to Detect Deception Md Kamrul Hasan, Wasifur Rahman, Luke Gerstner, Taylan Sen, Sangwu Lee, Kurtis Haut, and Ehsan Hoque ACII 2019 [PDF] [Project] Can baseline facial expression helps to detect deception during interview questions?.
	Unsupervised Text Segmentation using Coherence aware BERT Md Kamrul Hasan, Md Mahmudul Hasan, and Faisal Ishtiaq pre print Designed unsupervised text segmentation alogrithm that achived SOTA performances in multiple datasets.

Other Papers

Md Kamrul Hasan, Taylan Sen, Yiming Yang, Raiyan Abdul Baten, Kurtis Glenn Haut, Mohammed Ehsan Hoque. "LIWC into the Eyes: Using Facial Features to Contextualize Linguistic Analysis in Multimodal Communication". In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1-7. IEEE, 2019. [PDF] [Project]
Taylan Sen, Md Kamrul Hasan, Minh Tran, Matt Levin, Yiming Yang, M. Ehsan Hoque.``Say CHEESE: the Common Habitual Expression Encoder for Smile Examination and its Application to Analyze Deceptive Communication''. Automatic Face and Gesture Recognition Conference (FG), 2018. [PDF] [Project]
Taylan Sen, Md Kamrul Hasan, Zach Teicher, Mohammed Ehsan Hoque. "Automated dyadic data recorder (ADDR) framework and analysis of facial cues in deceptive communication". Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, no. 4 (2018): 1-22. [PDF]
Rasoul Shafipour, Raiyan Abdul Baten, Md Kamrul Hasan, Gourab Ghoshal, Gonzalo Mateos, Mohammed Ehsan Hoque. "Buildup of speaking skills in an online learning community: a network-analytic exploration". Palgrave Communications, 4(1), 1-10. [PDF]

Research Projects

Multimodal Representation Learning Collecting large-scale multimodal behavioral video datasets (e.g humor, sentiment prediction etc.) is extremely expensive, both in terms of time and money. There are no pre-trained models for visual or acoustic modalities which are suitable for studying non-verbal cues of human behavior. This project aims to learn multimodal representation from million of video utterances in self-supervised manner. The pre-trained model will be helpful for the downstream multimodal behavioral tasks with small datasets. Publications: Thesis (work in progress)
Multimodal Humor Understanding Humor produced in multimodal manner, through the usage of words (text), gestures (visual) and prosodic cues (acoustic). We introduced UR-FUNNY- the first video dataset for humor detection task. 8257 humorous punchlines are presented, along with the prior sentences that build up their respective contexts. Total duration of the dataset is 90 hours. UR-FUNNY opens the door to the research community for studying multimodal cues involved in expressing humor. Publications: AAAI 2021 , EMNLP 2019
Multimodal Sentiment Analysis Predicting sentiment in video utterance using multimodal signal involving language,visual and acoustic modalities. Designing deep learning algorithms that can capture both intra-modality and inter-modality interactions among these signals. We experiment the models on two popular multimodal sentiment analysis datasets of CMU-MOSI and CMU-MOSEI. Publications: ACL 2020 , AAAI 21 (under-review)
Multimodal Argument Analysis The current literature mostly considers textual content while assessing the quality of an argument, and is limited to datasets containing short sequences (18-48 words). In this project, we study argument quality assessment in a multimodal context, and experiment on DBATES, a publicly available dataset of long debate videos. Publications: EMNLP 2021 , Transcation on Affective Computing
Credibility Understanding Is there a chance that a computer can aid a human in detecting deceptive behavior? Are there other facial features that could indicate deception? Could you combine linguistic characteristics to aid in lie detection? How is all this data being collected and analyzed in the first place? The follwoing papers in address these types of questions and raise even more interesting ones. Publications: ACII 2019.a , ACII 2019.b , FG 2018 , UBICOMP 2018

Datasets

UR-FUNNY: Multimodal Dataset for Humor Understanding. [Data]
UR-LYING: Video Dataset for Credibility Understanding. [Data]

Opensource

Contact

Department of Computer Science
2513 Wegmans Hall
Box 270226
University of Rochester
Rochester, NY 14627
Email: mhasan8@cs.rochester.edu