Hello! 👋

I'm Junseok Oh (오준석) aka june-oh / RiceBerry

Ph.D. Candidate in Computer Science and Engineering

About Me

I am a Ph.D. candidate in Computer Science and Engineering at Sogang University, advised by Prof. Ji-Hwan Kim. My research focuses on End-to-End Automatic Speech Recognition (ASR), Speech Analytics & Assessment, Context-aware & Domain-specific ASR, and the integration of Large Language Models (LLMs) in Speech Technology. I am passionate about building robust, low-latency streaming ASR systems and developing automated speaking assessment frameworks.

Tech Stack

Programming & Tools

Speech & ML Frameworks

NVIDIA NeMo Kaldi KenLM Hugging Face Transformers & PEFT Whisper Wav2Vec FastConformer

Research Interests

ASR

Streaming ASR Robust ASR Context-biasing Domain Adaptation

Speech + LLM

Speech LLM LLM-based ASR Multimodal AI

Speech Analytics

Speaking Assessment Dysarthria Analysis Audio Event Detection

Research Projects

Adapter-Only Speech-LLM Bridging (PhD Dissertation)

Oct 2025 - Present

Fully frozen Whisper + Gemma with lightweight adapter-only bridging (0.44% params). 26.8% WER reduction on academic domains via inference-time domain prompting. PAKDD 2026 Accepted (Oral).

PyTorchWhisperGemmaAdapterDomain Adaptation

SEAM: Temporal-Semantic Bridging for Speech-LLM

May 2025 - Jan 2026

Encoder-decoder with variable-rate generation via cross-attention. Frozen speech encoder + LLM LoRA. 2.6%/5.2% WER on LibriSpeech, 4.7% on cross-domain TED-LIUM-v2. EACL 2026 Findings Accepted.

PyTorchWhisperLLMLoRAASR

Voice/Singing Conversion (SVC)

2025

Partner:Personal Project

End-to-end SVC pipeline: collected ~10h speaker data, extracted vocals via UVR5, trained so-vits-svc & whisper-vits-svc on RTX A5000. Performed voice conversion and singing voice conversion inference.

so-vits-svcWhisperUVR5SVCTTS

End-to-End Korean ASR

2024 - Apr 2025

Partner:Smilegate

Developed universal Korean ASR using hybrid FastConformer RNNT+CTC. Implemented cache-aware streaming for low-latency inference and context biasing for gaming domain vocabulary.

NVIDIA NeMoFastConformerRNNTCTC

Telephony (8kHz) End-to-End ASR

Apr 2024 - Dec 2024

Partner:LOTTE INNOVATE

Developed streaming/non-streaming Korean ASR pipelines optimized for 8 kHz telephony using FastConformer-CTC. Implemented dynamic context biasing for domain-shift word-level accuracy.

FastConformerCTCStreaming ASR

Automated Korean Speaking Assessment (2024)

May 2024 - Dec 2024

Partner:Ministry of Culture, Sports and Tourism

Wav2Vec multi-task framework for L2-Korean assessment: jointly models pronunciation, fluency, and content. Combined Conformer-CTC ASR with LLaMa for multi-aspect automated scoring.

Wav2VecConformerLLaMa

Dialog-based Multi-modal Explainable AI for Dysarthria

Apr 2022 - Present

Partner:MSIT / IITP

AI framework for dysarthria severity classification with multi-modal explanations. Implemented speech-based explainable diagnostic modules analyzing acoustic and linguistic features.

Explainable AIMulti-modalSpeech Analysis

Intelligent Video Content Rating

2022 - 2024

Partner:MSIT

Designed sound event detection models for automated video content rating. Fine-tuned Whisper ASR on domain-specific video corpora for robust transcription in diverse acoustic environments.

WhisperSound Event DetectionFine-tuning

Automated Korean Speaking Assessment (2023)

May 2023 - Dec 2023

Partner:Ministry of Culture, Sports and Tourism

Built L2-Korean evaluation pipeline combining Conformer-CTC ASR with BERT-based semantic scoring. Developed algorithms to quantify pronunciation accuracy, speech rate, and syntactic correctness.

ConformerCTCBERT

010

Video Story Understanding QA System

Sep 2017 - Dec 2019

Partner:MSIT

Modified Kaldi sentence-level decoder for sub-1.0 RT real-time video QA. Collected domain-specific audio/text corpora, retrained acoustic/language models for improved QA accuracy on complex narratives.

KaldiLanguage ModelReal-time ASR

Experience

Ph.D. Student

Sogang University - Auditory Intelligence Lab

Mar 2022 - Present

Advisor: Prof. Ji-Hwan Kim
Research on End-to-End ASR, Speech Analytics, and LLM integration in Speech Technology
Developed streaming ASR systems with FastConformer RNNT+CTC architecture
Published papers in EACL, EURASIP JASM, and TIIS journals

M.E. Student

Sogang University - Auditory Intelligence Lab

Sep 2017 - Aug 2019

Advisor: Prof. Ji-Hwan Kim
Thesis: Korean Real-Time Automatic Transcription System Using Weakly Labeled Corpus
Modified Kaldi decoder for sub-1.0 RT real-time video QA applications
Collected and curated domain-specific audio/text corpora for ASR optimization

Education

Ph.D. Candidate in Computer Science and Engineering

Sogang University

Mar 2022 - Present

Advisor: Prof. Ji-Hwan Kim
Research Focus: End-to-End ASR, Speech Analytics, LLM Integration
PAKDD 2026 Accepted, Oral Presentation
EACL 2026 Findings Accepted (SEAM)

Master of Engineering in Computer Science and Engineering

Sogang University

Sep 2017 - Aug 2019

Advisor: Prof. Ji-Hwan Kim
Thesis: Korean Real-Time Automatic Transcription System Using Weakly Labeled Corpus

Bachelor of Engineering in Computer Science and Engineering

Sogang University

Mar 2010 - Aug 2017

Publications

International Journals

[1]
J. Oh, J. Nam, and J.-H. Kim, "HiTCA: Fusing Hierarchical Text and Contextual Audio for Accurate VCR," EURASIP Journal on Audio, Speech, and Music Processing, 2025.SCIE, Under Review
[2]
S. Ma, J. Oh, M. Kim, and J.-H. Kim, "Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems," KSII Transactions on Internet & Information Systems (TIIS), vol. 19, no. 5, pp. 1406-1440, 2025.SCIE
[3]
J. Oh, E. Cho, and J.-H. Kim, "Integration of WFST language model in pre-trained Korean E2E ASR model," KSII Transactions on Internet and Information Systems (TIIS), vol. 18, no. 6, pp. 1692–1705, 2024.SCIE
[4]
S. Seo, J. Oh, E. Cho, H. Park, G. Kim, and J.-H. Kim, "TP-MobNet: A Two-pass Mobile Network for Low-complexity Classification of Acoustic Scene," Computers, Materials & Continua, vol. 73, no. 2, 2022.SCIE
[5]
M. Lim, D. Lee, H. Park, Y. Kang, J. Oh, J.-S. Park, G.-J. Jang, and J.-H. Kim, "Convolutional neural network based audio event classification," KSII Transactions on Internet and Information Systems (TIIS), vol. 12, no. 6, pp. 2748–2760, 2018.SCIE

International Conferences

[1]
J. Oh and J.-H. Kim, "Adapter-Only Bridging of Frozen Speech Encoder and Frozen LLM for ASR," in Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026.Accepted, Oral
[2]
J. Oh and J.-H. Kim, "SEAM: Bridging the Temporal-Semantic Granularity Gap for LLM-based Speech Recognition," in Findings of the Association for Computational Linguistics: EACL 2026, pp. 2135–2144, 2026.
[3]
J. Oh, H. Park, and J.-H. Kim, "Speech Intelligibility Prediction of Dysarthria Using Deep Convolutional Networks," in Proc. Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp. 236–237, 2023.
[4]
M. Kim, J. Oh, and J.-H. Kim, "Automated Dysarthria Severity Classification Using Diadochokinetic test and Speech Intelligibility Based on LightGBM," in Proc. Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp. 12–13, 2023.
[5]
S. Seo, M. Lim, D. Lee, H. Park, J. Oh, D. J. Rim, and J.-H. Kim, "Environmental noise robustness for Korean fricatives using speech enhancement generative adversarial networks," in Proc. IEEE Int. Conf. Big Data and Smart Computing (BigComp), pp. 1–4, 2019.
[6]
S. Seo, D. J. Rim, M. Lim, D. Lee, H. Park, J. Oh, C. Kim, and J.-H. Kim, "Shortcut connections based deep speaker embeddings for end-to-end speaker verification system," in Proc. Interspeech, pp. 2928–2932, 2019.

Domestic Journals

[1]
이정필, 장재후, 김지현, 김민섭, 김성준, 김민서, 김하영, 오준석, 정원, 김장연 외, "음성에 기반한 마비말장애 진단과 설명이 가능한 시스템," 정보과학회지, vol. 42, no. 4, pp. 45–56, 2024.KCI
[2]
H. Park, Y. Kang, M. Lim, D. Lee, J. Oh, and J.-H. Kim, "LFMMI-based acoustic modeling by using external knowledge," The Journal of the Acoustical Society of Korea, vol. 38, no. 5, pp. 607–613, 2019.KCI

Teaching

Teaching Assistant Experience

Teaching AssistantCSE5109/CSEG109/AIEG109/AIE5109

Audio Recognition, Synthesis & Transformation (생성형 AI 기반 오디오인식 및 합성/변환)

Fall 2024

Sogang University · Prof. Ji-Hwan Kim

Lab sessions covering audio processing, deep learning basics, language models, and FastSpeech2 TTS using PyTorch & Colab.

Lab Materials

Teaching AssistantSamsung AI Academy

Deep Learning-based Automatic Speech Recognition

Summer 2023

Sogang University × Samsung Electronics · Prof. Ji-Hwan Kim

Hands-on ASR tutorial (invited lecture by Prof. Ji-Hwan Kim) covering audio handling, MLP, CTC, Whisper, NVIDIA NeMo finetuning, and WFST using PyTorch & Colab notebooks.

Lab Materials

Teaching AssistantCSE5109/CSEG109/AIEG109/AIE5109

Audio Recognition, Synthesis & Transformation (오디오인식 및 합성변환)

Fall 2023

Sogang University · Prof. Ji-Hwan Kim

Lab sessions covering audio processing, PyTorch, RNN, CNN, Seq2Seq, and FastSpeech2/VocGAN TTS using PyTorch & Colab.

Lab Materials

Teaching AssistantCSE5311/CSEG311/GITA370

Conversational User Interface (대화형 사용자 인터페이스개론)

Spring/Fall 2022

Sogang University · Prof. Ji-Hwan Kim

Lab sessions on dialogue systems and conversational AI interface design.

Achievements

Awards

장려상2023

한국어 AI 경진대회

Track2-1, 상담 음성인식

Team '상담 ONE': 오준석, 김민서, 남주형

주관: NIA (한국지능정보사회진흥원)

최우수상 / 네이버 대표 (1위)2022

한국어 인공지능 경진대회

기업현안 (회의음성)

Team 'SGCSE': 오준석, 김하영

주관: NIA (한국지능정보사회진흥원)

최우수상 (1위)2021

음절인식률 측정 알고리즘 개발 대회

숫자 포함 패턴발화 음성 데이터셋 활용

Team '검은사케동': 박호성, 오준석, 조은수

주관: KT alpha

Patents

KR 10-2699607 (B1) - Corpus Construction Service Provision Server and Method (Granted: Aug 2024)

Certificates

NVIDIA Deep Learning Institute - Building Conversational AI Applications (2022)