Ph.D. Candidate in Computer Science and Engineering
View CV / ResumeI am a Ph.D. candidate in Computer Science and Engineering at Sogang University, advised by Prof. Ji-Hwan Kim. My research focuses on End-to-End Automatic Speech Recognition (ASR), Speech Analytics & Assessment, Context-aware & Domain-specific ASR, and the integration of Large Language Models (LLMs) in Speech Technology. I am passionate about building robust, low-latency streaming ASR systems and developing automated speaking assessment frameworks.
Speech & ML Frameworks
Fully frozen Whisper + Gemma with lightweight adapter-only bridging (0.44% params). 26.8% WER reduction on academic domains via inference-time domain prompting. PAKDD 2026 Accepted (Oral).
Encoder-decoder with variable-rate generation via cross-attention. Frozen speech encoder + LLM LoRA. 2.6%/5.2% WER on LibriSpeech, 4.7% on cross-domain TED-LIUM-v2. EACL 2026 Findings Accepted.
End-to-end SVC pipeline: collected ~10h speaker data, extracted vocals via UVR5, trained so-vits-svc & whisper-vits-svc on RTX A5000. Performed voice conversion and singing voice conversion inference.
Developed universal Korean ASR using hybrid FastConformer RNNT+CTC. Implemented cache-aware streaming for low-latency inference and context biasing for gaming domain vocabulary.
Developed streaming/non-streaming Korean ASR pipelines optimized for 8 kHz telephony using FastConformer-CTC. Implemented dynamic context biasing for domain-shift word-level accuracy.
Wav2Vec multi-task framework for L2-Korean assessment: jointly models pronunciation, fluency, and content. Combined Conformer-CTC ASR with LLaMa for multi-aspect automated scoring.
AI framework for dysarthria severity classification with multi-modal explanations. Implemented speech-based explainable diagnostic modules analyzing acoustic and linguistic features.
Designed sound event detection models for automated video content rating. Fine-tuned Whisper ASR on domain-specific video corpora for robust transcription in diverse acoustic environments.
Built L2-Korean evaluation pipeline combining Conformer-CTC ASR with BERT-based semantic scoring. Developed algorithms to quantify pronunciation accuracy, speech rate, and syntactic correctness.
Modified Kaldi sentence-level decoder for sub-1.0 RT real-time video QA. Collected domain-specific audio/text corpora, retrained acoustic/language models for improved QA accuracy on complex narratives.
Sogang University - Auditory Intelligence Lab
Sogang University - Auditory Intelligence Lab
Sogang University
Sogang University
Sogang University
J. Oh, J. Nam, and J.-H. Kim, "HiTCA: Fusing Hierarchical Text and Contextual Audio for Accurate VCR," EURASIP Journal on Audio, Speech, and Music Processing, 2025.SCIE, Under Review
S. Ma, J. Oh, M. Kim, and J.-H. Kim, "Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems," KSII Transactions on Internet & Information Systems (TIIS), vol. 19, no. 5, pp. 1406-1440, 2025.SCIE
J. Oh, E. Cho, and J.-H. Kim, "Integration of WFST language model in pre-trained Korean E2E ASR model," KSII Transactions on Internet and Information Systems (TIIS), vol. 18, no. 6, pp. 1692โ1705, 2024.SCIE
S. Seo, J. Oh, E. Cho, H. Park, G. Kim, and J.-H. Kim, "TP-MobNet: A Two-pass Mobile Network for Low-complexity Classification of Acoustic Scene," Computers, Materials & Continua, vol. 73, no. 2, 2022.SCIE
M. Lim, D. Lee, H. Park, Y. Kang, J. Oh, J.-S. Park, G.-J. Jang, and J.-H. Kim, "Convolutional neural network based audio event classification," KSII Transactions on Internet and Information Systems (TIIS), vol. 12, no. 6, pp. 2748โ2760, 2018.SCIE
J. Oh and J.-H. Kim, "Adapter-Only Bridging of Frozen Speech Encoder and Frozen LLM for ASR," in Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026.Accepted, Oral
J. Oh and J.-H. Kim, "SEAM: Bridging the Temporal-Semantic Granularity Gap for LLM-based Speech Recognition," in Findings of the Association for Computational Linguistics: EACL 2026, pp. 2135โ2144, 2026.
J. Oh, H. Park, and J.-H. Kim, "Speech Intelligibility Prediction of Dysarthria Using Deep Convolutional Networks," in Proc. Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp. 236โ237, 2023.
M. Kim, J. Oh, and J.-H. Kim, "Automated Dysarthria Severity Classification Using Diadochokinetic test and Speech Intelligibility Based on LightGBM," in Proc. Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp. 12โ13, 2023.
S. Seo, M. Lim, D. Lee, H. Park, J. Oh, D. J. Rim, and J.-H. Kim, "Environmental noise robustness for Korean fricatives using speech enhancement generative adversarial networks," in Proc. IEEE Int. Conf. Big Data and Smart Computing (BigComp), pp. 1โ4, 2019.
S. Seo, D. J. Rim, M. Lim, D. Lee, H. Park, J. Oh, C. Kim, and J.-H. Kim, "Shortcut connections based deep speaker embeddings for end-to-end speaker verification system," in Proc. Interspeech, pp. 2928โ2932, 2019.
์ด์ ํ, ์ฅ์ฌํ, ๊น์งํ, ๊น๋ฏผ์ญ, ๊น์ฑ์ค, ๊น๋ฏผ์, ๊นํ์, ์ค์ค์, ์ ์, ๊น์ฅ์ฐ ์ธ, "์์ฑ์ ๊ธฐ๋ฐํ ๋ง๋น๋ง์ฅ์ ์ง๋จ๊ณผ ์ค๋ช ์ด ๊ฐ๋ฅํ ์์คํ ," ์ ๋ณด๊ณผํํ์ง, vol. 42, no. 4, pp. 45โ56, 2024.KCI
H. Park, Y. Kang, M. Lim, D. Lee, J. Oh, and J.-H. Kim, "LFMMI-based acoustic modeling by using external knowledge," The Journal of the Acoustical Society of Korea, vol. 38, no. 5, pp. 607โ613, 2019.KCI
Teaching Assistant Experience
Lab sessions covering audio processing, deep learning basics, language models, and FastSpeech2 TTS using PyTorch & Colab.
Lab MaterialsHands-on ASR tutorial (invited lecture by Prof. Ji-Hwan Kim) covering audio handling, MLP, CTC, Whisper, NVIDIA NeMo finetuning, and WFST using PyTorch & Colab notebooks.
Lab MaterialsLab sessions covering audio processing, PyTorch, RNN, CNN, Seq2Seq, and FastSpeech2/VocGAN TTS using PyTorch & Colab.
Lab MaterialsLab sessions on dialogue systems and conversational AI interface design.
Track2-1, ์๋ด ์์ฑ์ธ์
์ฃผ๊ด: NIA (ํ๊ตญ์ง๋ฅ์ ๋ณด์ฌํ์งํฅ์)
๊ธฐ์ ํ์ (ํ์์์ฑ)
์ฃผ๊ด: NIA (ํ๊ตญ์ง๋ฅ์ ๋ณด์ฌํ์งํฅ์)
์ซ์ ํฌํจ ํจํด๋ฐํ ์์ฑ ๋ฐ์ดํฐ์ ํ์ฉ
์ฃผ๊ด: KT alpha
KR 10-2699607 (B1) - Corpus Construction Service Provision Server and Method (Granted: Aug 2024)
NVIDIA Deep Learning Institute - Building Conversational AI Applications (2022)