tokongs
02/17/2025, 11:29 AMvroldanbet
02/17/2025, 11:45 AMtokongs
02/17/2025, 12:03 PMminimize_latency always be at least as fast as an at_least_as_fresh with a token?vroldanbet
02/17/2025, 12:23 PMat_least_as_fresh will be faster because it can utilize the caches for longer, but not much longer. minimize_latency will give you the chance to reuse the caches for approx 6 seconds, while at_least_as_fresh will let you reuse the caches for 2x the quantization window, which is currently 10 seconds.tokongs
02/17/2025, 12:33 PMvroldanbet
02/17/2025, 12:37 PM