About Dialogue Example

In this example, we will extend the single-prompt example and have a dialogue with the LLM. We will be having a dialogue with the LLM or in other words, making the LLM remember the context, old messages etc. this can be achieved by using either of two methods given as:

  • Prompt Resupplying (slower but easier to understand).

  • Manual Caching (much faster but requires attention).

Brief on Prompt Resupply

In prompt resupplying, user literally supplies the entire dialogue every time with the last message user sends.

This is also the way how OpenAI API handles conversation history. The reason it is slower is that the entire prompt history is being computed and all KVs are generated every time a new message is sent.

The default behavior of the processor is deleting the KV cache every time a new input comes. That is why you should supply the entire history every time. However, with the manual caching method, this could be prevented and achieve much higher performance but requires some understanding and management.

Brief on Manual Caching

In manual caching, user tells the processor that he/she doesn’t want the processor to clear the cache every call by setting up the manual cache switch with logit store or kv lock mode using the set_manual_caching method of the processor.

By setting the logit store mode, processor will preserve the kv keys that are generated by the last prompt and response. By this way, the conversation is maintained internally by the processor. However, the user should clear the kv cache by himself using clear_kv_cache method.

We will make an implementation for both of these cases individually.

Note

Make sure to refer to Processor Object in Detail for further explanation.