Inference Library

Welcome to the MBASE inference library documentation!

Document Navigation

The document is structured into three chapters where each chapter has a distinct purpose. Those chapters are as follows:

  • Quickstart: This chapter starts by creating a CMake project and linking the inference SDK. Then, implement simple project examples to show how to use the inference SDK.

  • Programs: This chapter explains the programs that are developed using the inference library. The programs in that chapter are complete executable programs.

  • Information Reference: This chapter is for explaining the concepts, inference and SDK in detail. It is your goto if you struggle to understand some parts of the code, or struggling to understand the SDK usage, or examples in quickstart chapter.

About

MBASE inference library is a high-level LLM inference library written on top of the llama.cpp library to provide the necessary tools and APIs to allow developers to integrate popular LLMs into their applications with minimal performance loss and development time.

The inference SDK will allow developers the utilize the LLMs and create their own solutions, tooling mechanisms etc.

Features

  • TextToText LLM inference SDK.

  • Embedder model inference SDK.

  • GGUF file meta-data manipulation SDK.

  • Openai server program supporting both TextToText and Embedder endpoints with system prompt caching support which implies significant performance boost.

  • Hosting multiple models in a single Openai server program.

  • Using llama.cpp as an inference backend so that models that are supported by the llama.cpp library are supported by default.

  • Benchmark application for measuring the impact of LLM inference on your application.

  • Plus anything llama.cpp supports.

Supported Models

Since the MBASE SDK uses llama.cpp as a backend inference engine, the models that are supported by llama.cpp are supported by default which includes major models such as Phi, Deepseek, Llama and Qwen.

You can see the full list here.

Implementation Matrix

Type

SDK Support

Openai API Support

Engine

TextToText

llama.cpp

Embedder

llama.cpp

SDK Usage Examples

  • Single-Prompt: Simple prompt and answer example. At the end of the example, the user will supply a prompt in the terminal and LLM will give the response.

  • Dialogue-Example: More complex dialogue based prompt and answer example. At the end of the example, the user will be able to have a dialogue with LLM using terminal.

  • Embedding-Example :Vector embedding generator which is generally used by RAG programs and more. At the end of the example, the user will supply an input and vector embeddings will be generated by using embedder LLM model.

Useful Programs

Openai Server

Detailed documentation: Openai Server

An Openai API compatible HTTP/HTTPS server for serving LLMs. This program provides chat completion API for TextToText models and embeddings API For embedder models.

Usage:

mbase_openai_server *[option [value]]
mbase_openai_server --hostname "127.0.0.1" -jsdesc description.json
mbase_openai_server --hostname "127.0.0.1" --port 8080 -jsdesc description.json
mbase_openai_server --hostname "127.0.0.1" --port 8080 --ssl-pub public_key_file --ssl-key private_key_file -jsdesc description.json

Benchmark T2T

Detailed documentation: Benchmark T2T

It is a program written to measure the performance of the given T2T LLM and its impact on your main application logic.

Usage:

mbase_benchmark_t2t model_path *[option [value]]
mbase_benchmark_t2t model.gguf -uc 1 -fps 500 -jout .
mbase_benchmark_t2t model.gguf -uc 1 -fps 500 -jout . -mdout .

Embedding

Detailed documentation: Embedding Program

An example program for generating the embeddings of the given prompt or prompts.

Usage:

mbase_embedding_simple model_path *[option [value]]
mbase_embedding_simple model.gguf -gl 80 -p 'What is life?'
mbase_embedding_simple model.gguf -gl 80 -pf prompt1.txt -pf prompt2.txt

Retrieval

Detailed documentation: Retrieval Program

An example for calculating the distance between the given query and multiple text files/documents and applying a retrieval operation.

Usage:

mbase_retrieval model_path *[option [value]]
mbase_retrieval model.gguf -q 'What is MBASE' -pf file1.txt -pf file2.txt -gl 80

Simple Conversation

Detailed documentation: Simple Conversation Program

It is a simple executable program where you are having a dialogue with the LLM you provide. It is useful for examining the answer of the LLM since the system prompt and sampler values can be altered.

Usage:

mbase_simple_conversation model_path *[option [value]]
mbase_simple_conversation model.gguf
mbase_simple_conversation model.gguf -gl 80
mbase_simple_conversation model.gguf -gl 80 -sys 'You are a helpful assistant.'

Typo Fixer

Detailed documentation: Typo Fixer Program

This is an applied example use case of the MBASE library. The program is reading a user-supplied text file and fixing the typos.

Usage:

mbase_typo_fixer model_path *[option [value]]
mbase_typo_fixer model.gguf -gl 80 -s typo.txt -o fixed.txt

Finding the SDK

Detailed documentation: SDK Structure

If you have installed the MBASE SDK, you can find the library using CMake find_package function with components specification.

In order to find the library using cmake, write the following:

find_package(mbase.libs REQUIRED COMPONENTS inference)

This will find the inference SDK. In order to set the include directories and link the libraries, write the following:

target_compile_features(<your_target> PUBLIC cxx_std_17)
target_link_libraries(<your_target> PRIVATE mbase-inference)
target_include_directories(<your_target> PUBLIC mbase-inference)

GGUF, Displaying General Metadata

Detailed documentation: About GGUF Files

#include <mbase/inference/inf_gguf_metadata_configurator.h>
#include <iostream>

int main()
{
    mbase::GgufMetaConfigurator metaConfigurator(L"<path_to_model>");

    if(!metaConfigurator.is_open())
    {
        std::cout << "Unable to open gguf file." << std::endl;
        return 1;
    }

    mbase::string modelArchitecture;
    mbase::string modelName;
    mbase::string modelAuthor;
    mbase::string modelVersion;
    mbase::string modelOrganization;
    mbase::string modelSizeLabel;
    mbase::string modelLicense;
    mbase::string modelLicenseName;
    mbase::string modelLicenseLink;
    mbase::string modelUuid;
    uint32_t modelFileType;

    metaConfigurator.get_key("general.architecture", modelArchitecture);
    metaConfigurator.get_key("general.name", modelName);
    metaConfigurator.get_key("general.author", modelAuthor);
    metaConfigurator.get_key("general.version", modelVersion);
    metaConfigurator.get_key("general.organization", modelOrganization);
    metaConfigurator.get_key("general.size_label", modelSizeLabel);
    metaConfigurator.get_key("general.license", modelLicense);
    metaConfigurator.get_key("general.license.name", modelLicenseName);
    metaConfigurator.get_key("general.license.link", modelLicenseLink);
    metaConfigurator.get_key("general.uuid", modelUuid);
    metaConfigurator.get_key("general.file_type", modelFileType);

    std::cout << "Architecture: " << modelArchitecture << std::endl;
    std::cout << "Name: " << modelName << std::endl;
    std::cout << "Author: " << modelAuthor << std::endl;
    std::cout << "Version: " << modelVersion << std::endl;
    std::cout << "Organization: " << modelOrganization << std::endl;
    std::cout << "Size label: " << modelSizeLabel << std::endl;
    std::cout << "License: " << modelLicense << std::endl;
    std::cout << "License name: " << modelLicenseName << std::endl;
    std::cout << "License link: " << modelLicenseLink << std::endl;
    std::cout << "Model UUID: " << modelUuid << std::endl;
    std::cout << "File type: " << modelFileType << std::endl;

    return 0;
}