Hello! 👋

I'm Junseok Oh (오준석) aka june-oh / RiceBerry

Ph.D. Candidate in Computer Science and Engineering

View CV / Resume

About Me

I am a Ph.D. candidate in Computer Science and Engineering at Sogang University, advised by Prof. Ji-Hwan Kim. My research focuses on End-to-End Automatic Speech Recognition (ASR), Speech Analytics & Assessment, Context-aware & Domain-specific ASR, and the integration of Large Language Models (LLMs) in Speech Technology. I am passionate about building robust, low-latency streaming ASR systems and developing automated speaking assessment frameworks.

Tech Stack

Programming & Tools

Tech Stack Icons

Speech & ML Frameworks

NVIDIA NeMo Kaldi KenLM Hugging Face Transformers & PEFT Whisper Wav2Vec FastConformer

Research Interests

ASR
Streaming ASR Robust ASR Context-biasing Domain Adaptation
Speech + LLM
Speech LLM LLM-based ASR Multimodal AI
Speech Analytics
Speaking Assessment Dysarthria Analysis Audio Event Detection

Research Projects

01

Adapter-Only Speech-LLM Bridging (PhD Dissertation)

Oct 2025 - Present
Partner:MSIT / IITP

Fully frozen Whisper + Gemma with lightweight adapter-only bridging (0.44% params). 26.8% WER reduction on academic domains via inference-time domain prompting. PAKDD 2026 Accepted (Oral).

PyTorchWhisperGemmaAdapterDomain Adaptation
02

SEAM: Temporal-Semantic Bridging for Speech-LLM

May 2025 - Jan 2026
Partner:MSIT / IITP

Encoder-decoder architecture with variable-rate generation via cross-attention. Frozen speech encoder + LLM LoRA. Achieves 2.6%/5.2% WER on LibriSpeech and 4.7% on TED-LIUM-v2. EACL 2026 Findings.

PyTorchWhisperLLMLoRAASR
03

End-to-End Korean ASR for Gaming

2024 - Apr 2025
Partner:Smilegate

Developed a universal Korean ASR system using hybrid FastConformer RNN-Transducer + CTC model with cache-aware streaming and context biasing for gaming terminology.

NVIDIA NeMoFastConformerRNNTCTC
04

Telephony ASR System

Apr 2024 - Dec 2024
Partner:LOTTE INNOVATE

Developed streaming and non-streaming Korean ASR pipelines optimized for 8 kHz telephony data using FastConformer-CTC architecture with context-biasing modules.

FastConformerCTCStreaming ASR
05

Automated Korean Speaking Assessment (2024)

May 2024 - Dec 2024
Partner:Ministry of Culture, Sports and Tourism

Multi-task learning framework using Wav2Vec to jointly model pronunciation, fluency, and content for L2-Korean assessment, integrated with Conformer-CTC ASR and LLM for automated multi-aspect scoring.

Wav2VecConformerLLM
06

Automated Korean Speaking Assessment (2023)

May 2023 - Dec 2023
Partner:Ministry of Culture, Sports and Tourism

Built an end-to-end evaluation pipeline for L2 Korean speakers by combining Conformer-CTC ASR outputs with BERT-based semantic scoring. Developed algorithms to quantify pronunciation accuracy, speech rate, and syntactic correctness.

ConformerCTCBERT
07

Dialog-based Multi-modal Explainable AI

Apr 2022 - Present
Partner:MSIT / IITP

AI-based framework for dysarthria severity classification, providing multi-modal explanations to support diagnostic decision-making.

Explainable AIMulti-modalSpeech Analysis
08

Intelligent Audio Content Rating

2022 - 2024
Partner:MSIT

Led audio analytics submodule within automated video content rating framework. Fine-tuned Whisper ASR for domain-specific video corpora.

WhisperSound Event DetectionFine-tuning
09

Video Story Understanding-based QA System

Sep 2017 - Dec 2019
Partner:MSIT

Modified Kaldi's sentence-level decoder to achieve sub-1.0 RT for real-time video QA applications. Collected and curated domain-specific audio/text corpora to optimize acoustic and language models.

KaldiLanguage ModelReal-time ASR

Experience

Ph.D. Student

Sogang University - Auditory Intelligence Lab

Mar 2022 - Present
  • Advisor: Prof. Ji-Hwan Kim
  • Research on End-to-End ASR, Speech Analytics, and LLM integration in Speech Technology
  • Developed streaming ASR systems with FastConformer RNNT+CTC architecture
  • Published papers in EACL, EURASIP JASM, and TIIS journals

M.E. Student

Sogang University - Auditory Intelligence Lab

Sep 2017 - Aug 2019
  • Advisor: Prof. Ji-Hwan Kim
  • Thesis: Korean Real-Time Automatic Transcription System Using Weakly Labeled Corpus
  • Modified Kaldi decoder for sub-1.0 RT real-time video QA applications
  • Collected and curated domain-specific audio/text corpora for ASR optimization

Education

Ph.D. Candidate in Computer Science and Engineering

Sogang University

Mar 2022 - Present
  • Advisor: Prof. Ji-Hwan Kim
  • Research Focus: End-to-End ASR, Speech Analytics, LLM Integration
  • PAKDD 2026 Accepted, Oral Presentation
  • EACL 2026 Findings Accepted (SEAM)

Master of Engineering in Computer Science and Engineering

Sogang University

Sep 2017 - Aug 2019
  • Advisor: Prof. Ji-Hwan Kim
  • Thesis: Korean Real-Time Automatic Transcription System Using Weakly Labeled Corpus

Bachelor of Engineering in Computer Science and Engineering

Sogang University

Mar 2010 - Aug 2017

    Publications

    International Journals

    1. [1]

      J. Oh, J. Nam, and J.-H. Kim, "HiTCA: Fusing Hierarchical Text and Contextual Audio for Accurate VCR," EURASIP Journal on Audio, Speech, and Music Processing, 2025.SCIE, Under Review

    2. [2]

      S. Ma, J. Oh, M. Kim, and J.-H. Kim, "Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems," KSII Transactions on Internet & Information Systems (TIIS), vol. 19, no. 5, pp. 1406-1440, 2025.SCIE

    3. [3]

      J. Oh, E. Cho, and J.-H. Kim, "Integration of WFST language model in pre-trained Korean E2E ASR model," KSII Transactions on Internet and Information Systems (TIIS), vol. 18, no. 6, pp. 1692–1705, 2024.SCIE

    4. [4]

      S. Seo, J. Oh, E. Cho, H. Park, G. Kim, and J.-H. Kim, "TP-MobNet: A Two-pass Mobile Network for Low-complexity Classification of Acoustic Scene," Computers, Materials & Continua, vol. 73, no. 2, 2022.SCIE

    5. [5]

      M. Lim, D. Lee, H. Park, Y. Kang, J. Oh, J.-S. Park, G.-J. Jang, and J.-H. Kim, "Convolutional neural network based audio event classification," KSII Transactions on Internet and Information Systems (TIIS), vol. 12, no. 6, pp. 2748–2760, 2018.SCIE

    International Conferences

    1. [1]

      J. Oh and J.-H. Kim, "Adapter-Only Bridging of Frozen Speech Encoder and Frozen LLM for ASR," in Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026.Accepted, Oral

    2. [2]

      J. Oh and J.-H. Kim, "SEAM: Bridging the Semantic-Temporal Granularity Gap for LLM-based Speech Recognition," in Findings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2026.Accepted

    3. [3]

      J. Oh, H. Park, and J.-H. Kim, "Speech Intelligibility Prediction of Dysarthria Using Deep Convolutional Networks," in Proc. APIC-IST 2023, pp. 236–237, 2023.

    4. [4]

      M. Kim, J. Oh, and J.-H. Kim, "Automated Dysarthria Severity Classification Using Diadochokinetic test and Speech Intelligibility Based on LightGBM," in Proc. APIC-IST 2023, pp. 12–13, 2023.

    5. [5]

      S. Seo, M. Lim, D. Lee, H. Park, J. Oh, D. J. Rim, and J.-H. Kim, "Environmental noise robustness for Korean fricatives using speech enhancement generative adversarial networks," in Proc. IEEE Int. Conf. Big Data and Smart Computing (BigComp), pp. 1–4, 2019.

    6. [6]

      S. Seo, D. J. Rim, M. Lim, D. Lee, H. Park, J. Oh, C. Kim, and J.-H. Kim, "Shortcut connections based deep speaker embeddings for end-to-end speaker verification system," in Proc. Interspeech, pp. 17, 2019.

    Domestic Journals

    1. [1]

      이정필, 장재후, 김지현, 김민섭, 김성준, 김민서, 김하영, 오준석, 정원, 김장연 외, "음성에 기반한 마비말장애 진단과 설명이 가능한 시스템," 정보과학회지, vol. 42, no. 4, pp. 45–56, 2024.KCI

    2. [2]

      H. Park, Y. Kang, M. Lim, D. Lee, J. Oh, and J.-H. Kim, "LFMMI-based acoustic modeling by using external knowledge," The Journal of the Acoustical Society of Korea, vol. 38, no. 5, pp. 607–613, 2019.KCI

    Achievements

    Awards

    장려상2023

    한국어 AI 경진대회

    Track2-1, 상담 음성인식

    Team '상담 ONE': 오준석, 김민서, 남주형

    주관: NIA (한국지능정보사회진흥원)

    최우수상 / 네이버 대표 (1위)2022

    한국어 인공지능 경진대회

    기업현안 (회의음성)

    Team 'SGCSE': 오준석, 김하영

    주관: NIA (한국지능정보사회진흥원)

    최우수상 (1위)2021

    음절인식률 측정 알고리즘 개발 대회

    숫자 포함 패턴발화 음성 데이터셋 활용

    Team '검은사케동': 박호성, 오준석, 조은수

    주관: KT alpha

    Patents

    KR 10-2699607 (B1) - Corpus Construction Service Provision Server and Method (Granted: Aug 2024)

    Certificates

    NVIDIA Deep Learning Institute - Building Conversational AI Applications (2022)