Ph.D. Candidate in Computer Science and Engineering
View CV / ResumeI am a Ph.D. candidate in Computer Science and Engineering at Sogang University, advised by Prof. Ji-Hwan Kim. My research focuses on End-to-End Automatic Speech Recognition (ASR), Speech Analytics & Assessment, Context-aware & Domain-specific ASR, and the integration of Large Language Models (LLMs) in Speech Technology. I am passionate about building robust, low-latency streaming ASR systems and developing automated speaking assessment frameworks.
Speech & ML Frameworks
Fully frozen Whisper + Gemma with lightweight adapter-only bridging (0.44% params). 26.8% WER reduction on academic domains via inference-time domain prompting. PAKDD 2026 Accepted (Oral).
Encoder-decoder architecture with variable-rate generation via cross-attention. Frozen speech encoder + LLM LoRA. Achieves 2.6%/5.2% WER on LibriSpeech and 4.7% on TED-LIUM-v2. EACL 2026 Findings.
Developed a universal Korean ASR system using hybrid FastConformer RNN-Transducer + CTC model with cache-aware streaming and context biasing for gaming terminology.
Developed streaming and non-streaming Korean ASR pipelines optimized for 8 kHz telephony data using FastConformer-CTC architecture with context-biasing modules.
Multi-task learning framework using Wav2Vec to jointly model pronunciation, fluency, and content for L2-Korean assessment, integrated with Conformer-CTC ASR and LLM for automated multi-aspect scoring.
Built an end-to-end evaluation pipeline for L2 Korean speakers by combining Conformer-CTC ASR outputs with BERT-based semantic scoring. Developed algorithms to quantify pronunciation accuracy, speech rate, and syntactic correctness.
AI-based framework for dysarthria severity classification, providing multi-modal explanations to support diagnostic decision-making.
Led audio analytics submodule within automated video content rating framework. Fine-tuned Whisper ASR for domain-specific video corpora.
Modified Kaldi's sentence-level decoder to achieve sub-1.0 RT for real-time video QA applications. Collected and curated domain-specific audio/text corpora to optimize acoustic and language models.
Sogang University - Auditory Intelligence Lab
Sogang University - Auditory Intelligence Lab
Sogang University
Sogang University
Sogang University
J. Oh, J. Nam, and J.-H. Kim, "HiTCA: Fusing Hierarchical Text and Contextual Audio for Accurate VCR," EURASIP Journal on Audio, Speech, and Music Processing, 2025.SCIE, Under Review
S. Ma, J. Oh, M. Kim, and J.-H. Kim, "Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems," KSII Transactions on Internet & Information Systems (TIIS), vol. 19, no. 5, pp. 1406-1440, 2025.SCIE
J. Oh, E. Cho, and J.-H. Kim, "Integration of WFST language model in pre-trained Korean E2E ASR model," KSII Transactions on Internet and Information Systems (TIIS), vol. 18, no. 6, pp. 1692–1705, 2024.SCIE
S. Seo, J. Oh, E. Cho, H. Park, G. Kim, and J.-H. Kim, "TP-MobNet: A Two-pass Mobile Network for Low-complexity Classification of Acoustic Scene," Computers, Materials & Continua, vol. 73, no. 2, 2022.SCIE
M. Lim, D. Lee, H. Park, Y. Kang, J. Oh, J.-S. Park, G.-J. Jang, and J.-H. Kim, "Convolutional neural network based audio event classification," KSII Transactions on Internet and Information Systems (TIIS), vol. 12, no. 6, pp. 2748–2760, 2018.SCIE
J. Oh and J.-H. Kim, "Adapter-Only Bridging of Frozen Speech Encoder and Frozen LLM for ASR," in Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026.Accepted, Oral
J. Oh and J.-H. Kim, "SEAM: Bridging the Semantic-Temporal Granularity Gap for LLM-based Speech Recognition," in Findings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2026.Accepted
J. Oh, H. Park, and J.-H. Kim, "Speech Intelligibility Prediction of Dysarthria Using Deep Convolutional Networks," in Proc. APIC-IST 2023, pp. 236–237, 2023.
M. Kim, J. Oh, and J.-H. Kim, "Automated Dysarthria Severity Classification Using Diadochokinetic test and Speech Intelligibility Based on LightGBM," in Proc. APIC-IST 2023, pp. 12–13, 2023.
S. Seo, M. Lim, D. Lee, H. Park, J. Oh, D. J. Rim, and J.-H. Kim, "Environmental noise robustness for Korean fricatives using speech enhancement generative adversarial networks," in Proc. IEEE Int. Conf. Big Data and Smart Computing (BigComp), pp. 1–4, 2019.
S. Seo, D. J. Rim, M. Lim, D. Lee, H. Park, J. Oh, C. Kim, and J.-H. Kim, "Shortcut connections based deep speaker embeddings for end-to-end speaker verification system," in Proc. Interspeech, pp. 17, 2019.
이정필, 장재후, 김지현, 김민섭, 김성준, 김민서, 김하영, 오준석, 정원, 김장연 외, "음성에 기반한 마비말장애 진단과 설명이 가능한 시스템," 정보과학회지, vol. 42, no. 4, pp. 45–56, 2024.KCI
H. Park, Y. Kang, M. Lim, D. Lee, J. Oh, and J.-H. Kim, "LFMMI-based acoustic modeling by using external knowledge," The Journal of the Acoustical Society of Korea, vol. 38, no. 5, pp. 607–613, 2019.KCI
Track2-1, 상담 음성인식
주관: NIA (한국지능정보사회진흥원)
기업현안 (회의음성)
주관: NIA (한국지능정보사회진흥원)
숫자 포함 패턴발화 음성 데이터셋 활용
주관: KT alpha
KR 10-2699607 (B1) - Corpus Construction Service Provision Server and Method (Granted: Aug 2024)
NVIDIA Deep Learning Institute - Building Conversational AI Applications (2022)