Inference Information Reference¶
This chapter is for understanding the inference SDK more in depth.
Section contents are usually structured this way:
Giving a brief summary and remarks about the topic if the topic is universal and not specific to the inference SDK.
Giving a detailed explanation about the topic if the topic is specifically related to the inference SDK.
Showing how the inference SDK is related to the topic, useful objects and their locations and brief examples of how to use it.
Short SDK usage summary.
Header file synopsis.
Sections¶
- SDK Structure
- Parallel State Machine
- Inference Workflow in General
- Obtaining Hardware Information
- About GGUF Files
- Message Preparation
- Model Object in Detail
- Naming
- Identifying the Expensive Operations
- Essential Callbacks
- Essential Signals
- General Operation Workflow
- Synopsis
- Base Model Object
- TextToText Model Object
signal_lora_operation()
signal_state_lora_operation()
is_available()
is_embedding_model()
has_lora_adapter()
get_raw_model()
get_special_tokens()
get_special_tokens_string()
get_model_name()
get_architecture()
get_sys_start()
get_assistant_start()
get_usr_start()
get_sys_end()
get_assistant_end()
get_usr_end()
get_eot_token()
get_lf_token()
get_vocab_count()
get_size()
get_embedding_length()
get_head_count()
get_layer_count()
get_max_embedding_context()
is_token_eof_generation()
is_token_special()
is_token_control()
get_quantization_string()
get_total_context_size()
get_occupied_context_size()
initialize_model_ex()
initialize_model()
initialize_model_ex_sync()
initialize_model_sync()
destroy()
destroy_sync()
register_context_process()
register_context_process()
declare_lora_remove()
declare_lora_adapter()
start_lora_operation()
tokenize_input()
on_lora_operate()
update()
update_t()
- Processor Object in Detail
- Client Object in Detail
- About Sampling