Junha Heo

model architecture

Tokenizer / multimodal encoders
↓
Large Transformer backbone
├─ dense or sparse MoE
├─ long-context mechanisms
├─ multimodal input/output modules
↓
post-training / alignment / reasoning behavior

So the architecture hasn't changed much from basic transformer.

product/runtime architecture

User input
↓
model/router
↓
tool-use runtime
├─ web
├─ code interpreter
├─ file retrieval
├─ memory
├─ calendar/email/actions
↓
safety/policy filters
↓
final answer

I think there might be something to learn from this but it's just a process of chatbot service.

Final diagram for frontier AI model

User input
↓
Input processing
├─ tokenizer
├─ image/audio/video encoders
└─ context packing
↓
Model router
├─ cheap/fast model
├─ strong model
└─ reasoning model
↓
Frontier Transformer model
├─ dense or sparse MoE backbone
├─ long-context attention/memory tricks
├─ multimodal fusion
└─ instruction/reasoning post-training
↓
Inference-time reasoning
├─ hidden scratchpad / thinking tokens
├─ self-checking / deliberation
└─ variable compute budget
↓
Tool/action runtime
├─ search
├─ code
├─ retrieval
├─ files
├─ apps/APIs
└─ agent loops
↓
Safety + policy + formatting
↓
Final answer

I thought there might be more complex processes that I don't know, but there wasn't. I am now little bit confused of what to study more. I think making a real chatbot from scratch might be good.

what is the best chat AI architecture?