tokongs
02/17/2025, 11:29 AMvroldanbet
02/17/2025, 11:45 AMtokongs
02/17/2025, 12:03 PMminimize_latency
always be at least as fast as an at_least_as_fresh
with a token?vroldanbet
02/17/2025, 12:23 PMat_least_as_fresh
will be faster because it can utilize the caches for longer, but not much longer. minimize_latency
will give you the chance to reuse the caches for approx 6 seconds, while at_least_as_fresh
will let you reuse the caches for 2x the quantization window, which is currently 10 seconds.tokongs
02/17/2025, 12:33 PMvroldanbet
02/17/2025, 12:37 PM