// Evidence
Long-Context Validation
Public summary of how GTU is described under increasing history size, including the ultra-long validation surface.
Layered long-context buckets
The public layered set is used to keep task type roughly stable while baseline history size grows through 10k, 50k, 100k, 300k, and 800k+ token buckets.
infrastructure recall
configuration value recall
environment separation
planning target recall
Ultra-long validation
The ultra-long set is the narrowest and most extreme public surface. It is used to answer whether GTU still compresses aggressively without visible collapse in factual retention when baseline history becomes operationally extreme.
Cases
4
Baseline size
0.8M+ tokens
Reported final prompt
about 1.5k-1.6k tokens
Public takeaway
The long-context story on this site is simple: GTU is not only described through small benchmark cases. It is also presented against very large baseline histories where factual usefulness must survive extreme context length.