NTK Scaling
This is a roughly linear way to add additional context to existing models at a minor perplexity cost.
Here's a detailed explaination of the method.
Setup is simple using exllama on Ooba:
- Set your max_seq_len to >2k (play around with this value as there is a perpelexity cliff).
- Set alpha_value to 2/4/8 (the higher the value, the more context you can squeeze, but it gets dumber).
- Make sure to also set "Truncate the prompt up to this length" on the Parameters page to match your max_seq_len.
The following graph gives you some idea of where the cliffs are, though this changes per model.
Finally, Llama 2 is natively 4k context, and thus you can easily get this to ~8k with just an alpha value of 2.
Notes from NTK + Llama 2 testing:
Credit to @kingbri and @.alicat for the testing data on static NTK
You can essentially get close to 20k context with Llama 2, but the model gets increasingly dumber. It does not lose coherency though.
See also Github Discussion