Hello! 👋

I'm Junseok Oh (오준석) aka june-oh / RiceBerry

Ph.D. Candidate in Computer Science and Engineering

About Me

I am a Ph.D. candidate in Computer Science and Engineering at Sogang University, advised by Prof. Ji-Hwan Kim. My research focuses on End-to-End Automatic Speech Recognition (ASR), Speech Analytics & Assessment, Context-aware & Domain-specific ASR, and the integration of Large Language Models (LLMs) in Speech Technology. I am passionate about building robust, low-latency streaming ASR systems and developing automated speaking assessment frameworks.

Tech Stack

Programming & Tools

Speech & ML Frameworks

NVIDIA NeMo Kaldi KenLM Hugging Face Transformers & PEFT Whisper Wav2Vec FastConformer

Research Interests

ASR

Streaming ASR Robust ASR Context-biasing Domain Adaptation

Speech + LLM

Speech LLM LLM-based ASR Multimodal AI

Speech Analytics

Speaking Assessment Dysarthria Analysis Audio Event Detection

Research Projects

Adapter-Only Speech-LLM Bridging (PhD Dissertation)

Oct 2025 - Present

Partner:MSIT / IITP

Fully frozen Whisper + Gemma with lightweight adapter-only bridging (0.44% params). 26.8% WER reduction on academic domains via inference-time domain prompting. PAKDD 2026 Accepted (Oral).

PyTorchWhisperGemmaAdapterDomain Adaptation

SEAM: Temporal-Semantic Bridging for Speech-LLM

May 2025 - Jan 2026

Partner:MSIT / IITP

Encoder-decoder architecture with variable-rate generation via cross-attention. Frozen speech encoder + LLM LoRA. Achieves 2.6%/5.2% WER on LibriSpeech and 4.7% on TED-LIUM-v2. EACL 2026 Findings.

PyTorchWhisperLLMLoRAASR

End-to-End Korean ASR for Gaming

2024 - Apr 2025

Partner:Smilegate

Developed a universal Korean ASR system using hybrid FastConformer RNN-Transducer + CTC model with cache-aware streaming and context biasing for gaming terminology.

NVIDIA NeMoFastConformerRNNTCTC

Telephony ASR System

Apr 2024 - Dec 2024

Partner:LOTTE INNOVATE

Developed streaming and non-streaming Korean ASR pipelines optimized for 8 kHz telephony data using FastConformer-CTC architecture with context-biasing modules.

FastConformerCTCStreaming ASR

Automated Korean Speaking Assessment (2024)

May 2024 - Dec 2024

Partner:Ministry of Culture, Sports and Tourism

Multi-task learning framework using Wav2Vec to jointly model pronunciation, fluency, and content for L2-Korean assessment, integrated with Conformer-CTC ASR and LLM for automated multi-aspect scoring.

Wav2VecConformerLLM

Automated Korean Speaking Assessment (2023)

May 2023 - Dec 2023

Partner:Ministry of Culture, Sports and Tourism

Built an end-to-end evaluation pipeline for L2 Korean speakers by combining Conformer-CTC ASR outputs with BERT-based semantic scoring. Developed algorithms to quantify pronunciation accuracy, speech rate, and syntactic correctness.

ConformerCTCBERT

Dialog-based Multi-modal Explainable AI

Apr 2022 - Present

Partner:MSIT / IITP

AI-based framework for dysarthria severity classification, providing multi-modal explanations to support diagnostic decision-making.

Explainable AIMulti-modalSpeech Analysis

Intelligent Audio Content Rating

2022 - 2024

Partner:MSIT

Led audio analytics submodule within automated video content rating framework. Fine-tuned Whisper ASR for domain-specific video corpora.

WhisperSound Event DetectionFine-tuning

Video Story Understanding-based QA System

Sep 2017 - Dec 2019

Partner:MSIT

Modified Kaldi's sentence-level decoder to achieve sub-1.0 RT for real-time video QA applications. Collected and curated domain-specific audio/text corpora to optimize acoustic and language models.

KaldiLanguage ModelReal-time ASR

Experience

Ph.D. Student

Sogang University - Auditory Intelligence Lab

Mar 2022 - Present

Advisor: Prof. Ji-Hwan Kim
Research on End-to-End ASR, Speech Analytics, and LLM integration in Speech Technology
Developed streaming ASR systems with FastConformer RNNT+CTC architecture
Published papers in EACL, EURASIP JASM, and TIIS journals

M.E. Student

Sogang University - Auditory Intelligence Lab

Sep 2017 - Aug 2019

Advisor: Prof. Ji-Hwan Kim
Thesis: Korean Real-Time Automatic Transcription System Using Weakly Labeled Corpus
Modified Kaldi decoder for sub-1.0 RT real-time video QA applications
Collected and curated domain-specific audio/text corpora for ASR optimization

Education

Ph.D. Candidate in Computer Science and Engineering

Sogang University

Mar 2022 - Present

Advisor: Prof. Ji-Hwan Kim
Research Focus: End-to-End ASR, Speech Analytics, LLM Integration
PAKDD 2026 Accepted, Oral Presentation
EACL 2026 Findings Accepted (SEAM)

Master of Engineering in Computer Science and Engineering

Sogang University

Sep 2017 - Aug 2019

Advisor: Prof. Ji-Hwan Kim
Thesis: Korean Real-Time Automatic Transcription System Using Weakly Labeled Corpus

Bachelor of Engineering in Computer Science and Engineering

Sogang University

Mar 2010 - Aug 2017

Publications

International Journals

[1]
J. Oh, J. Nam, and J.-H. Kim, "HiTCA: Fusing Hierarchical Text and Contextual Audio for Accurate VCR," EURASIP Journal on Audio, Speech, and Music Processing, 2025.SCIE, Under Review
[2]
S. Ma, J. Oh, M. Kim, and J.-H. Kim, "Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems," KSII Transactions on Internet & Information Systems (TIIS), vol. 19, no. 5, pp. 1406-1440, 2025.SCIE
[3]
J. Oh, E. Cho, and J.-H. Kim, "Integration of WFST language model in pre-trained Korean E2E ASR model," KSII Transactions on Internet and Information Systems (TIIS), vol. 18, no. 6, pp. 1692–1705, 2024.SCIE
[4]
S. Seo, J. Oh, E. Cho, H. Park, G. Kim, and J.-H. Kim, "TP-MobNet: A Two-pass Mobile Network for Low-complexity Classification of Acoustic Scene," Computers, Materials & Continua, vol. 73, no. 2, 2022.SCIE
[5]
M. Lim, D. Lee, H. Park, Y. Kang, J. Oh, J.-S. Park, G.-J. Jang, and J.-H. Kim, "Convolutional neural network based audio event classification," KSII Transactions on Internet and Information Systems (TIIS), vol. 12, no. 6, pp. 2748–2760, 2018.SCIE

International Conferences

[1]
J. Oh and J.-H. Kim, "Adapter-Only Bridging of Frozen Speech Encoder and Frozen LLM for ASR," in Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026.Accepted, Oral
[2]
J. Oh and J.-H. Kim, "SEAM: Bridging the Semantic-Temporal Granularity Gap for LLM-based Speech Recognition," in Findings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2026.Accepted
[3]
J. Oh, H. Park, and J.-H. Kim, "Speech Intelligibility Prediction of Dysarthria Using Deep Convolutional Networks," in Proc. APIC-IST 2023, pp. 236–237, 2023.
[4]
M. Kim, J. Oh, and J.-H. Kim, "Automated Dysarthria Severity Classification Using Diadochokinetic test and Speech Intelligibility Based on LightGBM," in Proc. APIC-IST 2023, pp. 12–13, 2023.
[5]
S. Seo, M. Lim, D. Lee, H. Park, J. Oh, D. J. Rim, and J.-H. Kim, "Environmental noise robustness for Korean fricatives using speech enhancement generative adversarial networks," in Proc. IEEE Int. Conf. Big Data and Smart Computing (BigComp), pp. 1–4, 2019.
[6]
S. Seo, D. J. Rim, M. Lim, D. Lee, H. Park, J. Oh, C. Kim, and J.-H. Kim, "Shortcut connections based deep speaker embeddings for end-to-end speaker verification system," in Proc. Interspeech, pp. 17, 2019.

Domestic Journals

[1]
이정필, 장재후, 김지현, 김민섭, 김성준, 김민서, 김하영, 오준석, 정원, 김장연 외, "음성에 기반한 마비말장애 진단과 설명이 가능한 시스템," 정보과학회지, vol. 42, no. 4, pp. 45–56, 2024.KCI
[2]
H. Park, Y. Kang, M. Lim, D. Lee, J. Oh, and J.-H. Kim, "LFMMI-based acoustic modeling by using external knowledge," The Journal of the Acoustical Society of Korea, vol. 38, no. 5, pp. 607–613, 2019.KCI

Achievements

Awards

장려상2023

한국어 AI 경진대회

Track2-1, 상담 음성인식

Team '상담 ONE': 오준석, 김민서, 남주형

주관: NIA (한국지능정보사회진흥원)

최우수상 / 네이버 대표 (1위)2022

한국어 인공지능 경진대회

기업현안 (회의음성)

Team 'SGCSE': 오준석, 김하영

주관: NIA (한국지능정보사회진흥원)

최우수상 (1위)2021

음절인식률 측정 알고리즘 개발 대회

숫자 포함 패턴발화 음성 데이터셋 활용

Team '검은사케동': 박호성, 오준석, 조은수

주관: KT alpha

Patents

KR 10-2699607 (B1) - Corpus Construction Service Provision Server and Method (Granted: Aug 2024)

Certificates

NVIDIA Deep Learning Institute - Building Conversational AI Applications (2022)