Projects Showcase

Featured / Research

Turkish Morphological Tokenizer

A context-aware analyzer using Finite State Transducers and Viterbi-based POS disambiguation. It algorithmically models Turkish phonetic rules and resolves polysemy in 65k+ roots.

View Repository Also look data pipeline

Finite State Transducer

Project Overview

This research project addresses the complexity of agglutinative morphology in Turkish. By combining Finite State Transducers (FST) with probabilistic disambiguation, we achieve high-accuracy segmentation without the massive compute requirements of LLMs.

Core Tech: FST for phonology, Hidden Markov Models for disambiguation.
Dataset: Custom 65k+ root lexicon derived from Kaikki, Zemberek, and TDK.
Impact: Provides a lightweight, interpretable alternative to neural tokenizers.

Why is this hard?

Turkish is an agglutinative language where a single word can correspond to an entire English sentence (e.g., "çekoslovakyalılaştıramadıklarımızdanmışsınızcasına").

Ambiguity: A single surface form can have dozens of valid parses.
Phonetic Harmony: Vowel harmony and consonant changes create a massive search space.
OOV Issue: Dictionary-based lookups fail on productive derivations.

Approach

We built a two-stage pipeline:

1. FST Generation: Using OpenFst principles, we model phonological rules (vowel harmony, drops) as state transitions. This generates all possible parses for a word.
2. Disambiguation: A Viterbi decoder scores path probabilities based on bigram POS statistics trained on the BOUN corpus.
3. Rule Engine: A final rule-based layer handles edge cases like proper nouns and date formats.

Key Metrics

Status: Research in progress.
Focus: Phonological rule coverage and disambiguation logic.

Ablation Studies

To validate components, we disabled parts of the pipeline:

Without Viterbi: Accuracy drops to ~70% (random selection of valid parses).
Without Phonological Rules: Coverage drops significantly on complex derivations.

Lessons Learned

Data Quality > Model Complexity: Cleaning the lexicon gave higher gains than tweaking the HMM.
Hybrid is Robust: Combining FST (exact) with Probabilistic (guess) handles both seen and unseen words best.
Interpretability matters: Unlike BERT tokenizers, we can debug exactly why a word was split a certain way.

İpekGPT

LLM-based chatbot for İpek Yolu Entrepreneur Incubation Center. Features RAG architecture for accurate, context-aware responses.

<200ms Latency

90% Cost Reduction

Hybrid Retrieval

Try Live Demo Source Code

Projects

Teknofest / AI Assistant

FitTurkAI

Personalized nutrition assistant combining RAG and fine-tuned CosmosGemma for task-focused dietetic advice.

TECH STACK

PYTHON COSMOSGEMMA 2B RAG FLASK

KEY FEATURES

Fine-tuned CosmosGemma 2B model for dietetic expertise.
RAG pipeline processing 40+ nutrition PDFs.
Personalized meal planning based on biometrics.

RAG ARCHITECTURE

SOURCE CODE

Productivity / System

Personal OS

LLM-based chatbot for Ipek Yolu Entrepreneur Incubation Center. Features RAG architecture for accurate, context-aware responses.

TECH STACK

OPENAI API QDRANT FASTAPI REACT

KEY FEATURES

Hybrid retrieval (Vector + Keyword) for high accuracy.
Sub-200ms latency on production deployment.
Context-aware conversation history management.

HYBRID SEARCH FLOW

Source Code

Productivity / Open Source

KEGOMODORO

Customizable Pomodoro timer with Pixela integration for productivity tracking. Features custom themes, stopwatch mode, and full personalization.

TECH STACK

ELECTRON JAVASCRIPT PIXELA API CSS3

KEY FEATURES

Real-time Pixela graph integration for habit tracking.
Customizable themes and focus/break intervals.
Electron-based cross-platform architecture.

LIVE ACTIVITY

Projects Showcase

Turkish Morphological Tokenizer

Project Overview

Why is this hard?

Approach

Key Metrics

Ablation Studies

Lessons Learned

Data Collection

FST Design

Disambiguation

Evaluation

İpekGPT

Projects

FitTurkAI

Personal OS

KEGOMODORO

İpekGPT Case Study

Problem

Approach

Metrics

Links

FitTurkAI Case Study

Problem

Approach

Architecture

KEGOMODORO Case Study

Problem

Approach

Results