Hello! ๐Ÿ‘‹

I'm Junseok Oh (์˜ค์ค€์„) aka june-oh / RiceBerry

Ph.D. Candidate in Computer Science and Engineering

View CV / Resume

About Me

I am a Ph.D. candidate in Computer Science and Engineering at Sogang University, advised by Prof. Ji-Hwan Kim. My research focuses on End-to-End Automatic Speech Recognition (ASR), Speech Analytics & Assessment, Context-aware & Domain-specific ASR, and the integration of Large Language Models (LLMs) in Speech Technology. I am passionate about building robust, low-latency streaming ASR systems and developing automated speaking assessment frameworks.

Tech Stack

Programming & Tools

Tech Stack Icons

Speech & ML Frameworks

NVIDIA NeMo Kaldi KenLM Hugging Face Transformers & PEFT Whisper Wav2Vec FastConformer

Research Interests

ASR
Streaming ASR Robust ASR Context-biasing Domain Adaptation
Speech + LLM
Speech LLM LLM-based ASR Multimodal AI
Speech Analytics
Speaking Assessment Dysarthria Analysis Audio Event Detection

Research Projects

01

Adapter-Only Speech-LLM Bridging (PhD Dissertation)

Oct 2025 - Present

Fully frozen Whisper + Gemma with lightweight adapter-only bridging (0.44% params). 26.8% WER reduction on academic domains via inference-time domain prompting. PAKDD 2026 Accepted (Oral).

PyTorchWhisperGemmaAdapterDomain Adaptation
02

SEAM: Temporal-Semantic Bridging for Speech-LLM

May 2025 - Jan 2026

Encoder-decoder with variable-rate generation via cross-attention. Frozen speech encoder + LLM LoRA. 2.6%/5.2% WER on LibriSpeech, 4.7% on cross-domain TED-LIUM-v2. EACL 2026 Findings Accepted.

PyTorchWhisperLLMLoRAASR
03

Voice/Singing Conversion (SVC)

2025
Partner:Personal Project

End-to-end SVC pipeline: collected ~10h speaker data, extracted vocals via UVR5, trained so-vits-svc & whisper-vits-svc on RTX A5000. Performed voice conversion and singing voice conversion inference.

so-vits-svcWhisperUVR5SVCTTS
04

End-to-End Korean ASR

2024 - Apr 2025
Partner:Smilegate

Developed universal Korean ASR using hybrid FastConformer RNNT+CTC. Implemented cache-aware streaming for low-latency inference and context biasing for gaming domain vocabulary.

NVIDIA NeMoFastConformerRNNTCTC
05

Telephony (8kHz) End-to-End ASR

Apr 2024 - Dec 2024
Partner:LOTTE INNOVATE

Developed streaming/non-streaming Korean ASR pipelines optimized for 8 kHz telephony using FastConformer-CTC. Implemented dynamic context biasing for domain-shift word-level accuracy.

FastConformerCTCStreaming ASR
06

Automated Korean Speaking Assessment (2024)

May 2024 - Dec 2024
Partner:Ministry of Culture, Sports and Tourism

Wav2Vec multi-task framework for L2-Korean assessment: jointly models pronunciation, fluency, and content. Combined Conformer-CTC ASR with LLaMa for multi-aspect automated scoring.

Wav2VecConformerLLaMa
07

Dialog-based Multi-modal Explainable AI for Dysarthria

Apr 2022 - Present
Partner:MSIT / IITP

AI framework for dysarthria severity classification with multi-modal explanations. Implemented speech-based explainable diagnostic modules analyzing acoustic and linguistic features.

Explainable AIMulti-modalSpeech Analysis
08

Intelligent Video Content Rating

2022 - 2024
Partner:MSIT

Designed sound event detection models for automated video content rating. Fine-tuned Whisper ASR on domain-specific video corpora for robust transcription in diverse acoustic environments.

WhisperSound Event DetectionFine-tuning
09

Automated Korean Speaking Assessment (2023)

May 2023 - Dec 2023
Partner:Ministry of Culture, Sports and Tourism

Built L2-Korean evaluation pipeline combining Conformer-CTC ASR with BERT-based semantic scoring. Developed algorithms to quantify pronunciation accuracy, speech rate, and syntactic correctness.

ConformerCTCBERT
010

Video Story Understanding QA System

Sep 2017 - Dec 2019
Partner:MSIT

Modified Kaldi sentence-level decoder for sub-1.0 RT real-time video QA. Collected domain-specific audio/text corpora, retrained acoustic/language models for improved QA accuracy on complex narratives.

KaldiLanguage ModelReal-time ASR

Experience

Ph.D. Student

Sogang University - Auditory Intelligence Lab

Mar 2022 - Present
  • Advisor: Prof. Ji-Hwan Kim
  • Research on End-to-End ASR, Speech Analytics, and LLM integration in Speech Technology
  • Developed streaming ASR systems with FastConformer RNNT+CTC architecture
  • Published papers in EACL, EURASIP JASM, and TIIS journals

M.E. Student

Sogang University - Auditory Intelligence Lab

Sep 2017 - Aug 2019
  • Advisor: Prof. Ji-Hwan Kim
  • Thesis: Korean Real-Time Automatic Transcription System Using Weakly Labeled Corpus
  • Modified Kaldi decoder for sub-1.0 RT real-time video QA applications
  • Collected and curated domain-specific audio/text corpora for ASR optimization

Education

Ph.D. Candidate in Computer Science and Engineering

Sogang University

Mar 2022 - Present
  • Advisor: Prof. Ji-Hwan Kim
  • Research Focus: End-to-End ASR, Speech Analytics, LLM Integration
  • PAKDD 2026 Accepted, Oral Presentation
  • EACL 2026 Findings Accepted (SEAM)

Master of Engineering in Computer Science and Engineering

Sogang University

Sep 2017 - Aug 2019
  • Advisor: Prof. Ji-Hwan Kim
  • Thesis: Korean Real-Time Automatic Transcription System Using Weakly Labeled Corpus

Bachelor of Engineering in Computer Science and Engineering

Sogang University

Mar 2010 - Aug 2017

    Publications

    International Journals

    1. [1]

      J. Oh, J. Nam, and J.-H. Kim, "HiTCA: Fusing Hierarchical Text and Contextual Audio for Accurate VCR," EURASIP Journal on Audio, Speech, and Music Processing, 2025.SCIE, Under Review

    2. [2]

      S. Ma, J. Oh, M. Kim, and J.-H. Kim, "Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems," KSII Transactions on Internet & Information Systems (TIIS), vol. 19, no. 5, pp. 1406-1440, 2025.SCIE

    3. [3]

      J. Oh, E. Cho, and J.-H. Kim, "Integration of WFST language model in pre-trained Korean E2E ASR model," KSII Transactions on Internet and Information Systems (TIIS), vol. 18, no. 6, pp. 1692โ€“1705, 2024.SCIE

    4. [4]

      S. Seo, J. Oh, E. Cho, H. Park, G. Kim, and J.-H. Kim, "TP-MobNet: A Two-pass Mobile Network for Low-complexity Classification of Acoustic Scene," Computers, Materials & Continua, vol. 73, no. 2, 2022.SCIE

    5. [5]

      M. Lim, D. Lee, H. Park, Y. Kang, J. Oh, J.-S. Park, G.-J. Jang, and J.-H. Kim, "Convolutional neural network based audio event classification," KSII Transactions on Internet and Information Systems (TIIS), vol. 12, no. 6, pp. 2748โ€“2760, 2018.SCIE

    International Conferences

    1. [1]

      J. Oh and J.-H. Kim, "Adapter-Only Bridging of Frozen Speech Encoder and Frozen LLM for ASR," in Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026.Accepted, Oral

    2. [2]

      J. Oh and J.-H. Kim, "SEAM: Bridging the Temporal-Semantic Granularity Gap for LLM-based Speech Recognition," in Findings of the Association for Computational Linguistics: EACL 2026, pp. 2135โ€“2144, 2026.

    3. [3]

      J. Oh, H. Park, and J.-H. Kim, "Speech Intelligibility Prediction of Dysarthria Using Deep Convolutional Networks," in Proc. Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp. 236โ€“237, 2023.

    4. [4]

      M. Kim, J. Oh, and J.-H. Kim, "Automated Dysarthria Severity Classification Using Diadochokinetic test and Speech Intelligibility Based on LightGBM," in Proc. Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp. 12โ€“13, 2023.

    5. [5]

      S. Seo, M. Lim, D. Lee, H. Park, J. Oh, D. J. Rim, and J.-H. Kim, "Environmental noise robustness for Korean fricatives using speech enhancement generative adversarial networks," in Proc. IEEE Int. Conf. Big Data and Smart Computing (BigComp), pp. 1โ€“4, 2019.

    6. [6]

      S. Seo, D. J. Rim, M. Lim, D. Lee, H. Park, J. Oh, C. Kim, and J.-H. Kim, "Shortcut connections based deep speaker embeddings for end-to-end speaker verification system," in Proc. Interspeech, pp. 2928โ€“2932, 2019.

    Domestic Journals

    1. [1]

      ์ด์ •ํ•„, ์žฅ์žฌํ›„, ๊น€์ง€ํ˜„, ๊น€๋ฏผ์„ญ, ๊น€์„ฑ์ค€, ๊น€๋ฏผ์„œ, ๊น€ํ•˜์˜, ์˜ค์ค€์„, ์ •์›, ๊น€์žฅ์—ฐ ์™ธ, "์Œ์„ฑ์— ๊ธฐ๋ฐ˜ํ•œ ๋งˆ๋น„๋ง์žฅ์•  ์ง„๋‹จ๊ณผ ์„ค๋ช…์ด ๊ฐ€๋Šฅํ•œ ์‹œ์Šคํ…œ," ์ •๋ณด๊ณผํ•™ํšŒ์ง€, vol. 42, no. 4, pp. 45โ€“56, 2024.KCI

    2. [2]

      H. Park, Y. Kang, M. Lim, D. Lee, J. Oh, and J.-H. Kim, "LFMMI-based acoustic modeling by using external knowledge," The Journal of the Acoustical Society of Korea, vol. 38, no. 5, pp. 607โ€“613, 2019.KCI

    Teaching

    Teaching Assistant Experience

    Teaching AssistantCSE5109/CSEG109/AIEG109/AIE5109

    Audio Recognition, Synthesis & Transformation (์ƒ์„ฑํ˜• AI ๊ธฐ๋ฐ˜ ์˜ค๋””์˜ค์ธ์‹ ๋ฐ ํ•ฉ์„ฑ/๋ณ€ํ™˜)

    Fall 2024
    Sogang University ยท Prof. Ji-Hwan Kim

    Lab sessions covering audio processing, deep learning basics, language models, and FastSpeech2 TTS using PyTorch & Colab.

    Lab Materials
    Teaching AssistantSamsung AI Academy

    Deep Learning-based Automatic Speech Recognition

    Summer 2023
    Sogang University ร— Samsung Electronics ยท Prof. Ji-Hwan Kim

    Hands-on ASR tutorial (invited lecture by Prof. Ji-Hwan Kim) covering audio handling, MLP, CTC, Whisper, NVIDIA NeMo finetuning, and WFST using PyTorch & Colab notebooks.

    Lab Materials
    Teaching AssistantCSE5109/CSEG109/AIEG109/AIE5109

    Audio Recognition, Synthesis & Transformation (์˜ค๋””์˜ค์ธ์‹ ๋ฐ ํ•ฉ์„ฑ๋ณ€ํ™˜)

    Fall 2023
    Sogang University ยท Prof. Ji-Hwan Kim

    Lab sessions covering audio processing, PyTorch, RNN, CNN, Seq2Seq, and FastSpeech2/VocGAN TTS using PyTorch & Colab.

    Lab Materials
    Teaching AssistantCSE5311/CSEG311/GITA370

    Conversational User Interface (๋Œ€ํ™”ํ˜• ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜์ด์Šค๊ฐœ๋ก )

    Spring/Fall 2022
    Sogang University ยท Prof. Ji-Hwan Kim

    Lab sessions on dialogue systems and conversational AI interface design.

    Achievements

    Awards

    ์žฅ๋ ค์ƒ2023

    ํ•œ๊ตญ์–ด AI ๊ฒฝ์ง„๋Œ€ํšŒ

    Track2-1, ์ƒ๋‹ด ์Œ์„ฑ์ธ์‹

    Team '์ƒ๋‹ด ONE': ์˜ค์ค€์„, ๊น€๋ฏผ์„œ, ๋‚จ์ฃผํ˜•

    ์ฃผ๊ด€: NIA (ํ•œ๊ตญ์ง€๋Šฅ์ •๋ณด์‚ฌํšŒ์ง„ํฅ์›)

    ์ตœ์šฐ์ˆ˜์ƒ / ๋„ค์ด๋ฒ„ ๋Œ€ํ‘œ (1์œ„)2022

    ํ•œ๊ตญ์–ด ์ธ๊ณต์ง€๋Šฅ ๊ฒฝ์ง„๋Œ€ํšŒ

    ๊ธฐ์—…ํ˜„์•ˆ (ํšŒ์˜์Œ์„ฑ)

    Team 'SGCSE': ์˜ค์ค€์„, ๊น€ํ•˜์˜

    ์ฃผ๊ด€: NIA (ํ•œ๊ตญ์ง€๋Šฅ์ •๋ณด์‚ฌํšŒ์ง„ํฅ์›)

    ์ตœ์šฐ์ˆ˜์ƒ (1์œ„)2021

    ์Œ์ ˆ์ธ์‹๋ฅ  ์ธก์ • ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐœ๋ฐœ ๋Œ€ํšŒ

    ์ˆซ์ž ํฌํ•จ ํŒจํ„ด๋ฐœํ™” ์Œ์„ฑ ๋ฐ์ดํ„ฐ์…‹ ํ™œ์šฉ

    Team '๊ฒ€์€์‚ฌ์ผ€๋™': ๋ฐ•ํ˜ธ์„ฑ, ์˜ค์ค€์„, ์กฐ์€์ˆ˜

    ์ฃผ๊ด€: KT alpha

    Patents

    KR 10-2699607 (B1) - Corpus Construction Service Provision Server and Method (Granted: Aug 2024)

    Certificates

    NVIDIA Deep Learning Institute - Building Conversational AI Applications (2022)