Inference Library¶

Welcome to the MBASE inference library documentation!

Document Navigation¶

The document is structured into four chapters where each chapter has a distinct purpose. Those chapters are as follows:

Quickstart: This chapter starts by explaining how to setup your environment, and goes with system requirements identification, library installations and simple project examples to show how to use the inference SDK.
Programs: This chapter explains the programs that are developed using the inference library. The programs in that chapter are complete executable programs.
Information Reference: This chapter is for explaining the concepts, inference and SDK in detail. It is your goto if you struggle to understand some parts of the code, or struggling to understand the SDK usage, or examples in quickstart chapter.

About¶

MBASE inference library is a high-level non-blocking LLM inference library written on top of the llama.cpp library to provide the necessary tools and APIs to allow developers to integrate popular LLMs into their applications with minimal performance loss and development time.

The inference SDK will allow developers the utilize the LLMs and create their own solutions, tooling mechanisms etc.

When the phrase “local LLM inference” is thrown around, it usually means that hosting an Openai API compatible HTTP server and using the completions API locally. The MBASE inference library is expected to change this notion by providing you the LLM inference capability through its low-level objects and procedures so that you can integrate and embedd LLM into your high-performance applications such as games, server applications and many more.

Also you still have the option of hosting Openai server using the mbase_openai_server program or you can code a similar one yourself!

There also is a benchmarking tool developed for testing the performance of the LLM but more specifically, the impact of the inference operation on your main program loop. For further details refer to benchmark documentation.

Features¶

Non-blocking TextToText LLM inference SDK.
Non-blocking Embedder model inference SDK.
GGUF file meta-data manipulation SDK.
Openai server program supporting both TextToText and Embedder endpoints with system prompt caching support which implies significant performance boost.
Hosting multiple models in a single Openai server program.
Using llama.cpp as an inference backend so that models that are supported by the llama.cpp library are supported by default.
Benchmark application for measuring the impact of LLM inference on your application.
Plus anything llama.cpp supports.

Supported Models¶

Since the MBASE SDK uses llama.cpp as a backend inference engine, the models that are supported by llama.cpp are supported by default which includes major models such as Phi, Deepseek, Llama and Qwen.

You can see the full list here.

Implementation Matrix¶

Type	SDK Support	Openai API Support	Engine
TextToText			llama.cpp
Embedder			llama.cpp

Supported Platforms¶

Mac OS
Linux
Windows

Download and Setting up¶

Download page: Download

SDK setup and compiling from source: Setting-up

Useful Programs¶

Openai Server¶

Detailed documentation: Openai Server

An Openai API compatible HTTP/HTTPS server for serving LLMs. This program provides chat completion API for TextToText models and embeddings API For embedder models.

Usage:

mbase_openai_server *[option [value]]
mbase_openai_server --hostname "127.0.0.1" -jsdesc description.json
mbase_openai_server --hostname "127.0.0.1" --port 8080 -jsdesc description.json
mbase_openai_server --hostname "127.0.0.1" --port 8080 --ssl-pub public_key_file --ssl-key private_key_file -jsdesc description.json

Benchmark T2T¶

Detailed documentation: Benchmark T2T

It is a program written to measure the performance of the given T2T LLM and its impact on your main application logic.

Usage:

mbase_benchmark_t2t model_path *[option [value]]
mbase_benchmark_t2t model.gguf -uc 1 -fps 500 -jout .
mbase_benchmark_t2t model.gguf -uc 1 -fps 500 -jout . -mdout .

Embedding¶

Detailed documentation: Embedding Program

An example program for generating the embeddings of the given prompt or prompts.

Usage:

mbase_embedding_simple model_path *[option [value]]
mbase_embedding_simple model.gguf -gl 80 -p 'What is life?'
mbase_embedding_simple model.gguf -gl 80 -pf prompt1.txt -pf prompt2.txt

Retrieval¶

Detailed documentation: Retrieval Program

An example for calculating the distance between the given query and multiple text files/documents and applying a retrieval operation.

Usage:

mbase_retrieval model_path *[option [value]]
mbase_retrieval model.gguf -q 'What is MBASE' -pf file1.txt -pf file2.txt -gl 80

Simple Conversation¶

Detailed documentation: Simple Conversation Program

It is a simple executable program where you are having a dialogue with the LLM you provide. It is useful for examining the answer of the LLM since the system prompt and sampler values can be altered.

Usage:

mbase_simple_conversation model_path *[option [value]]
mbase_simple_conversation model.gguf
mbase_simple_conversation model.gguf -gl 80
mbase_simple_conversation model.gguf -gl 80 -sys 'You are a helpful assistant.'

Typo Fixer¶

Detailed documentation: Typo Fixer Program

This is an applied example use case of the MBASE library. The program is reading a user-supplied text file and fixing the typos.

Usage:

mbase_typo_fixer model_path *[option [value]]
mbase_typo_fixer model.gguf -gl 80 -s typo.txt -o fixed.txt

SDK Usage Examples¶

Single-Prompt: Simple prompt and answer example. At the end of the example, the user will supply a prompt in the terminal and LLM will give the response.
Dialogue-Example: More complex dialogue based prompt and answer example. At the end of the example, the user will be able to have a dialogue with LLM using terminal.
Embedding-Example :Vector embedding generator which is generally used by RAG programs and more. At the end of the example, the user will supply an input and vector embeddings will be generated by using embedder LLM model.

Finding the SDK¶

Detailed documentation: SDK Structure

If you have installed the MBASE SDK, you can find the library using CMake find_package function with components specification.

In order to find the library using cmake, write the following:

find_package(mbase.libs REQUIRED COMPONENTS inference)

This will find the inference SDK. In order to set the include directories and link the libraries, write the following:

target_compile_features(<your_target> PUBLIC cxx_std_17)
target_link_libraries(<your_target> PRIVATE mbase-inference)
target_include_directories(<your_target> PUBLIC mbase-inference)

GGUF, Displaying General Metadata¶

Detailed documentation: About GGUF Files

#include <mbase/inference/inf_gguf_metadata_configurator.h>
#include <iostream>

int main()
{
    mbase::GgufMetaConfigurator metaConfigurator(L"<path_to_model>");

    if(!metaConfigurator.is_open())
    {
        std::cout << "Unable to open gguf file." << std::endl;
        return 1;
    }

    mbase::string modelArchitecture;
    mbase::string modelName;
    mbase::string modelAuthor;
    mbase::string modelVersion;
    mbase::string modelOrganization;
    mbase::string modelSizeLabel;
    mbase::string modelLicense;
    mbase::string modelLicenseName;
    mbase::string modelLicenseLink;
    mbase::string modelUuid;
    uint32_t modelFileType;

    metaConfigurator.get_key("general.architecture", modelArchitecture);
    metaConfigurator.get_key("general.name", modelName);
    metaConfigurator.get_key("general.author", modelAuthor);
    metaConfigurator.get_key("general.version", modelVersion);
    metaConfigurator.get_key("general.organization", modelOrganization);
    metaConfigurator.get_key("general.size_label", modelSizeLabel);
    metaConfigurator.get_key("general.license", modelLicense);
    metaConfigurator.get_key("general.license.name", modelLicenseName);
    metaConfigurator.get_key("general.license.link", modelLicenseLink);
    metaConfigurator.get_key("general.uuid", modelUuid);
    metaConfigurator.get_key("general.file_type", modelFileType);

    std::cout << "Architecture: " << modelArchitecture << std::endl;
    std::cout << "Name: " << modelName << std::endl;
    std::cout << "Author: " << modelAuthor << std::endl;
    std::cout << "Version: " << modelVersion << std::endl;
    std::cout << "Organization: " << modelOrganization << std::endl;
    std::cout << "Size label: " << modelSizeLabel << std::endl;
    std::cout << "License: " << modelLicense << std::endl;
    std::cout << "License name: " << modelLicenseName << std::endl;
    std::cout << "License link: " << modelLicenseLink << std::endl;
    std::cout << "Model UUID: " << modelUuid << std::endl;
    std::cout << "File type: " << modelFileType << std::endl;

    return 0;
}

Project State and Goals¶

The MBASE SDK is in its baby-steps for now and it is expected to err on some scenarios because not all cases are tested and the product is not yet fully-polished.

The project has been developed only by me and I was planning to open-source in the near future. However, the complication and the workload forced me to open-source the project much earlier before being perfectly tested and polished. In other words, I am once again asking for your contribution to this project.

The company is being established to accelerate the development of the MBASE SDK and for creating both proprietary and open-source products built using the MBASE SDK.

The goal of the MBASE SDK is expected to provide non-blocking LLM inference to both beginner and advanced users of the C++ library and very useful tools and programs to non-programmer users.