์๊ฐ๋ํ๊ต ์ปดํจํฐ๊ณตํ๊ณผ ๋ฐ์ฌ๊ณผ์ ์๋ฃ
View CV / Resume์๊ฐ๋ํ๊ต ์ปดํจํฐ๊ณตํ๊ณผ ๋ฐ์ฌ๊ณผ์ ์๋ฃ (์ง๋๊ต์: ๊น์งํ). ์ฃผ์ ์ฐ๊ตฌ ๋ถ์ผ๋ End-to-End ์์ฑ ์ธ์(ASR), ์์ฑ ๋ถ์ ๋ฐ ํ๊ฐ, ๋ฌธ๋งฅ ์ธ์ ๋ฐ ๋๋ฉ์ธ ํนํ ASR, ๋๊ท๋ชจ ์ธ์ด ๋ชจ๋ธ(LLM)๊ณผ ์์ฑ ๊ธฐ์ ์ ํตํฉ์ ๋๋ค. ๊ฐ๊ฑดํ ์ ์ง์ฐ ์คํธ๋ฆฌ๋ฐ ASR ์์คํ ๊ฐ๋ฐ๊ณผ ์๋ ๋งํ๊ธฐ ํ๊ฐ ํ๋ ์์ํฌ ๊ตฌ์ถ์ ์ฃผ๋ ฅํ๊ณ ์์ต๋๋ค.
Speech & ML Frameworks
Whisper(์์ฑ ์ธ์ฝ๋)์ Gemma(LLM) ๋ชจ๋ ๋๊ฒฐ ์ํ์์ ์ ์ฒด์ 0.44% ๊ฒฝ๋ ์ด๋ํฐ๋ง ํ์ต. ํ์ ๊ฐ์ฐ 4๊ฐ ๋ถ์ผ ํ๊ท 26.8% WER ๊ฐ์. ์์ฐ์ด ๋๋ฉ์ธ ํ๋กฌํํ ์ผ๋ก ์ ๋ฌธ ์ดํ F1 +7.2%p ํฅ์. PAKDD 2026 Accepted (Oral).
๊ต์ฐจ ์ดํ ์ ๊ธฐ๋ฐ ๊ฐ๋ณ ์๋ ์์ฑ ์ธ์ฝ๋-๋์ฝ๋ ๋ชจ๋. ์์ฑ ์ธ์ฝ๋ ๋๊ฒฐ + LLM LoRA. LibriSpeech 2.6%/5.2% WER, ๊ต์ฐจ ๋๋ฉ์ธ TED-LIUM-v2 4.7% WER ๋ฌ์ฑ. EACL 2026 Findings Accepted.
so-vits-svc ๋ฐ whisper-vits-svc๋ฅผ ํ์ฉํ End-to-end SVC ํ์ดํ๋ผ์ธ. ์ฝ 10์๊ฐ ํ์ ๋ฐ์ดํฐ ์์ง, UVR5 ๋ณด์ปฌ ์ถ์ถ, RTX A5000 ํ์ต. ์์ฑ ๋ณํ ๋ฐ ๊ฐ์ฐฝ ๋ณํ ์ถ๋ก ์ํ.
Hybrid FastConformer RNNT+CTC ๊ธฐ๋ฐ ๋ฒ์ฉ ํ๊ตญ์ด ASR ์์คํ ๊ฐ๋ฐ. Cache-aware ์คํธ๋ฆฌ๋ฐ ์ ์ง์ฐ ์ถ๋ก ๊ตฌํ. ๊ฒ์ ๋๋ฉ์ธ ์ดํ ์ปจํ ์คํธ ๋ฐ์ด์ด์ฑ ์ ์ฉ์ผ๋ก ์ธ์ ์ฑ๋ฅ ํฅ์.
8kHz ์ ํ๋ง ๋ฐ์ดํฐ์ ์ต์ ํ๋ ์คํธ๋ฆฌ๋ฐ/๋น์คํธ๋ฆฌ๋ฐ ํ๊ตญ์ด ASR ํ์ดํ๋ผ์ธ ๊ฐ๋ฐ (FastConformer-CTC). ๋๋ฉ์ธ ์ํํธ ๋์ ๋์ ์ปจํ ์คํธ ๋ฐ์ด์ด์ฑ ๋ชจ๋ ๊ตฌํ.
Wav2Vec ๊ธฐ๋ฐ ๋ฉํฐ ํ์คํฌ ํ์ต์ผ๋ก ๋ฐ์, ์ ์ฐฝ์ฑ, ๋ด์ฉ์ ๊ณต๋ ๋ชจ๋ธ๋งํ๋ L2-ํ๊ตญ์ด ๋งํ๊ธฐ ํ๊ฐ ํ๋ ์์ํฌ ๊ฐ๋ฐ. Conformer-CTC ASR + LLaMa ๊ฒฐํฉ ๋ค์ธก๋ฉด ์๋ ์ฑ์ .
๋ง๋น๋ง์ฅ์ ์ค์ฆ๋ ๋ถ๋ฅ๋ฅผ ์ํ AI ํ๋ ์์ํฌ ๊ฐ๋ฐ. ์ํฅ/์ธ์ด ํน์ฑ ๋ถ์ ๊ธฐ๋ฐ ํด์ ๊ฐ๋ฅํ ์ง๋จ ๋ชจ๋ ๊ตฌํ. ๋ํํ ๋ฉํฐ๋ชจ๋ฌ ํ๊ฒฝ์์ AI ์ถ๋ ฅ๊ณผ ์ฌ์ฉ์ ์ดํด ๊ฐ ํด์ ๊ฒฉ์ฐจ ์ํ.
์๋ ์์ ๋ฑ๊ธ ํ์ ํ๋ ์์ํฌ ๋ด ์์ฑ์ธ์ ๋ฐ ์ค๋์ค ๋ถ์ ๋ด๋น. ์ํฅ ์ด๋ฒคํธ ๊ฒ์ถ(SED) ๋ชจ๋ธ ์ค๊ณ. ๋๋ฉ์ธ ์์ ์ฝํผ์ค์ ๋ง์ถฐ Whisper ASR ํ์ธํ๋.
Conformer-CTC ASR ์ถ๋ ฅ + BERT ๊ธฐ๋ฐ ์๋ฏธ ์ ์ํ๋ฅผ ๊ฒฐํฉํ L2-ํ๊ตญ์ด ํ๊ฐ ํ์ดํ๋ผ์ธ ๊ตฌ์ถ. ๋ฐ์ ์ ํ๋, ๋ฐํ ์๋, ๊ตฌ๋ฌธ์ ์ ํ์ฑ ์ ๋ํ ์๊ณ ๋ฆฌ์ฆ ๊ฐ๋ฐ.
Kaldi ๋ฌธ์ฅ ๋จ์ ๋์ฝ๋๋ฅผ ์์ ํ์ฌ ์ค์๊ฐ ๋น๋์ค QA์์ RT 1.0 ๋ฏธ๋ง ๋ฌ์ฑ. ํ๊น ๋น๋์ค์์ ๋๋ฉ์ธ ํนํ ์ฝํผ์ค ์์ง/์ ์ , ์ํฅ/์ธ์ด ๋ชจ๋ธ ์ต์ ํ.
์๊ฐ๋ํ๊ต Auditory Intelligence Lab
์๊ฐ๋ํ๊ต Auditory Intelligence Lab
์๊ฐ๋ํ๊ต
์๊ฐ๋ํ๊ต
์๊ฐ๋ํ๊ต
J. Oh, J. Nam, and J.-H. Kim, "HiTCA: Fusing Hierarchical Text and Contextual Audio for Accurate VCR," EURASIP Journal on Audio, Speech, and Music Processing, 2025.SCIE, Under Review
S. Ma, J. Oh, M. Kim, and J.-H. Kim, "Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems," KSII Transactions on Internet & Information Systems (TIIS), vol. 19, no. 5, pp. 1406-1440, 2025.SCIE
J. Oh, E. Cho, and J.-H. Kim, "Integration of WFST language model in pre-trained Korean E2E ASR model," KSII Transactions on Internet and Information Systems (TIIS), vol. 18, no. 6, pp. 1692โ1705, 2024.SCIE
S. Seo, J. Oh, E. Cho, H. Park, G. Kim, and J.-H. Kim, "TP-MobNet: A Two-pass Mobile Network for Low-complexity Classification of Acoustic Scene," Computers, Materials & Continua, vol. 73, no. 2, 2022.SCIE
M. Lim, D. Lee, H. Park, Y. Kang, J. Oh, J.-S. Park, G.-J. Jang, and J.-H. Kim, "Convolutional neural network based audio event classification," KSII Transactions on Internet and Information Systems (TIIS), vol. 12, no. 6, pp. 2748โ2760, 2018.SCIE
J. Oh and J.-H. Kim, "Adapter-Only Bridging of Frozen Speech Encoder and Frozen LLM for ASR," in Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026.Accepted, Oral
J. Oh and J.-H. Kim, "SEAM: Bridging the Temporal-Semantic Granularity Gap for LLM-based Speech Recognition," in Findings of the Association for Computational Linguistics: EACL 2026, pp. 2135โ2144, 2026.
J. Oh, H. Park, and J.-H. Kim, "Speech Intelligibility Prediction of Dysarthria Using Deep Convolutional Networks," in Proc. Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp. 236โ237, 2023.
M. Kim, J. Oh, and J.-H. Kim, "Automated Dysarthria Severity Classification Using Diadochokinetic test and Speech Intelligibility Based on LightGBM," in Proc. Asia Pacific International Conference on Information Science and Technology (APIC-IST), pp. 12โ13, 2023.
S. Seo, M. Lim, D. Lee, H. Park, J. Oh, D. J. Rim, and J.-H. Kim, "Environmental noise robustness for Korean fricatives using speech enhancement generative adversarial networks," in Proc. IEEE Int. Conf. Big Data and Smart Computing (BigComp), pp. 1โ4, 2019.
S. Seo, D. J. Rim, M. Lim, D. Lee, H. Park, J. Oh, C. Kim, and J.-H. Kim, "Shortcut connections based deep speaker embeddings for end-to-end speaker verification system," in Proc. Interspeech, pp. 2928โ2932, 2019.
์ด์ ํ, ์ฅ์ฌํ, ๊น์งํ, ๊น๋ฏผ์ญ, ๊น์ฑ์ค, ๊น๋ฏผ์, ๊นํ์, ์ค์ค์, ์ ์, ๊น์ฅ์ฐ ์ธ, "์์ฑ์ ๊ธฐ๋ฐํ ๋ง๋น๋ง์ฅ์ ์ง๋จ๊ณผ ์ค๋ช ์ด ๊ฐ๋ฅํ ์์คํ ," ์ ๋ณด๊ณผํํ์ง, vol. 42, no. 4, pp. 45โ56, 2024.KCI
H. Park, Y. Kang, M. Lim, D. Lee, J. Oh, and J.-H. Kim, "LFMMI-based acoustic modeling by using external knowledge," The Journal of the Acoustical Society of Korea, vol. 38, no. 5, pp. 607โ613, 2019.KCI
Teaching Assistant Experience
์ค๋์ค ์ฒ๋ฆฌ, ๋ฅ๋ฌ๋ ๊ธฐ์ด, ์ธ์ด ๋ชจ๋ธ, FastSpeech2 TTS ์ค์ต
Lab Materials๊น์งํ ๊ต์ ์ด์ฒญ ๊ฐ์ ์กฐ๊ต. ์ค๋์ค ์ฒ๋ฆฌ, MLP, CTC, Whisper, NeMo ํ์ธํ๋, WFST ์ค์ต.
Lab Materials์ค๋์ค ์ฒ๋ฆฌ, PyTorch, RNN/CNN/Seq2Seq, FastSpeech2/VocGAN TTS ์ค์ต
Lab Materials๋ํ ์์คํ ๋ฐ ๋ํํ AI ์ธํฐํ์ด์ค ์ค๊ณ ์ค์ต
Track2-1, ์๋ด ์์ฑ์ธ์
์ฃผ๊ด: ํ๊ตญ์ง๋ฅ์ ๋ณด์ฌํ์งํฅ์(NIA)
๊ธฐ์ ํ์ (ํ์์์ฑ)
์ฃผ๊ด: ํ๊ตญ์ง๋ฅ์ ๋ณด์ฌํ์งํฅ์(NIA)
์ซ์ ํฌํจ ํจํด๋ฐํ ์์ฑ ๋ฐ์ดํฐ์ ํ์ฉ
์ฃผ๊ด: KT alpha
KR 10-2699607 (B1) - ์ฝํผ์ค ๊ตฌ์ถ ์๋น์ค ์ ๊ณต ์๋ฒ ๋ฐ ๋ฐฉ๋ฒ (๋ฑ๋ก: 2024.08)
NVIDIA Deep Learning Institute - Building Conversational AI Applications (2022)