Documentation Index
Fetch the complete documentation index at: https://liquidai-feat-leap-sdk-0-10-8.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
If youβve used a cloud chat-completion API (OpenAI, Anthropic, etc.), most of LEAPβs shape will be familiar β async streaming, role-tagged messages, JSON-serializable history. The biggest difference: you load the model explicitly, locally, before generation, instead of pointing a client at a remote endpoint.
This page maps the OpenAI Python clientβs flow onto the LEAP SDK across Swift, Kotlin (Android), and Kotlin (JVM / native). For OpenAI compatibility on the client side, also see OpenAI-Compatible Client.
Reference: an OpenAI streaming call
from openai import OpenAI
client = OpenAI()
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Say 'double bubble bath' ten times fast."}],
stream=True,
)
for chunk in stream:
if chunk.choices:
delta = chunk.choices[0].delta.get("content")
if delta:
print(delta, end="", flush=True)
print("\nGeneration done!")
1. Load the model (vs. construct a client)
Cloud APIs create a thin client that points at a remote endpoint. LEAP downloads the model the first time and loads it into a ModelRunner β typically a few seconds depending on model size and device.
OpenAI (Python)
Swift (iOS / macOS)
Kotlin (Android)
Kotlin (JVM / native)
import LeapModelDownloader
let caches = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first!.path
let modelsDir = (caches as NSString).appendingPathComponent("leap_models")
let downloader = ModelDownloader(config: LeapDownloaderConfig(saveDir: modelsDir))
let runner = try await downloader.loadModel(
modelName: "LFM2.5-1.2B-Instruct",
quantizationType: "Q4_K_M"
)
val downloader = LeapModelDownloader(context)
val runner = downloader.loadModel(
modelName = "LFM2.5-1.2B-Instruct",
quantizationType = "Q4_K_M",
)
val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = cacheDir))
val runner = downloader.loadModel(
modelName = "LFM2.5-1.2B-Instruct",
quantizationType = "Q4_K_M",
)
The returned ModelRunner plays the same role as the cloud APIβs client object β except it carries the model weights. Release it and youβll have to load again before generating.
2. Request generation
The cloud API takes a messages array and returns a stream. LEAP attaches messages to a Conversation (so history is tracked automatically) and returns an async stream from generateResponse(...).
OpenAI (Python)
Swift (iOS / macOS)
Kotlin (all platforms)
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "..."}],
stream=True,
)
let conversation = runner.createConversation()
let stream = conversation.generateResponse(userTextMessage: "Say 'double bubble bath' ten times fast.")
val conversation = runner.createConversation()
val flow = conversation.generateResponse("Say 'double bubble bath' ten times fast.")
You donβt pass the model name on each call β the Conversation is already bound to the runner that loaded it.
3. Consume the stream
Cloud APIs deliver deltas; you concatenate them. LEAP delivers MessageResponse values; each variant maps to a UI update, audio frame, tool call, or completion marker.
OpenAI (Python)
Swift (iOS / macOS)
Kotlin (all platforms)
for chunk in stream:
if chunk.choices:
delta = chunk.choices[0].delta.get("content")
if delta:
print(delta, end="", flush=True)
for try await response in stream {
switch onEnum(of: response) {
case .chunk(let chunk):
print(chunk.text, terminator: "")
case .complete(let completion):
print("\nDone! Tokens: \(completion.stats?.totalTokens ?? 0)")
case .reasoningChunk, .audioSample, .functionCalls:
break
case .error(let err):
print("\nGeneration failed: \(err.message)")
}
}
flow.onEach { response ->
when (response) {
is MessageResponse.Chunk -> print(response.text)
is MessageResponse.Complete -> println("\nDone! Tokens: ${response.stats?.totalTokens}")
is MessageResponse.ReasoningChunk -> {}
is MessageResponse.FunctionCalls -> {}
is MessageResponse.AudioSample -> {}
is MessageResponse.Error -> println("\nGeneration failed: ${response.message}")
}
}.collect()
4. Async context
Both LEAP and the OpenAI Python streaming client run inside an async context. The SDKβs call shape mirrors the languageβs idiomatic concurrency primitives.
Swift (iOS / macOS)
Kotlin (Android)
Kotlin (JVM / native)
Wrap calls in a Task. SwiftUIβs .task modifier on a view is the most common entry. @MainActor view models keep model state on the main thread; the for try await loop suspends the task until the next chunk arrives.@MainActor
final class ChatViewModel: ObservableObject {
@Published var currentResponse = ""
private var runner: ModelRunner?
private var conversation: Conversation?
private let downloader: ModelDownloader = {
let caches = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first!.path
let modelsDir = (caches as NSString).appendingPathComponent("leap_models")
return ModelDownloader(config: LeapDownloaderConfig(saveDir: modelsDir))
}()
func loadModel() async {
runner = try? await downloader.loadModel(
modelName: "LFM2.5-1.2B-Instruct",
quantizationType: "Q4_K_M"
)
conversation = runner?.createConversation()
}
func sendMessage(_ text: String) {
guard let conversation else { return }
Task {
let message = ChatMessage(role: .user, content: [.text(text)])
for try await response in conversation.generateResponse(message: message) {
if case .chunk(let c) = onEnum(of: response) {
currentResponse += c.text
}
}
}
}
}
Use viewModelScope (or lifecycleScope for activity-bound work). The flow is collected on the coroutine; cancellation is cooperative.class ChatViewModel(application: Application) : AndroidViewModel(application) {
private val downloader = LeapModelDownloader(application)
private var runner: ModelRunner? = null
private var conversation: Conversation? = null
private val _text = MutableStateFlow("")
val text: StateFlow<String> = _text.asStateFlow()
fun loadModel() = viewModelScope.launch {
runner = downloader.loadModel(
modelName = "LFM2.5-1.2B-Instruct",
quantizationType = "Q4_K_M"
)
conversation = runner?.createConversation()
}
fun send(text: String) = viewModelScope.launch {
conversation?.generateResponse(text)?.onEach { resp ->
if (resp is MessageResponse.Chunk) _text.value += resp.text
}?.collect()
}
}
Use any coroutine scope β runBlocking for CLIs, a custom CoroutineScope for server-side code, or MainScope() for Compose for Desktop.fun main() = runBlocking {
val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = cacheDir))
val runner = downloader.loadModel(
modelName = "LFM2.5-1.2B-Instruct",
quantizationType = "Q4_K_M"
)
val conversation = runner.createConversation()
conversation.generateResponse("Hello").collect { resp ->
if (resp is MessageResponse.Chunk) print(resp.text)
}
}
Whatβs the same
| Concept | OpenAI | LEAP |
|---|
| Role-tagged messages | {"role": "user", "content": "..."} | ChatMessage(role: .user, textContent: "...") |
| Streaming responses | stream=True iterator | SkieSwiftFlow<MessageResponse> (Swift, iterable with for try await) / Flow<MessageResponse> (Kotlin) |
| Function calling | Tool definitions + tool_calls field | registerFunction(LeapFunction) + MessageResponse.FunctionCalls |
| Structured output | response_format = json_schema | Swift options.with(jsonSchema: T.jsonSchema()) / Kotlin setResponseFormatType<T>() |
| Token usage stats | usage object on completion | Complete.stats (promptTokens, completionTokens, tokenPerSecond) |
Whatβs different
- No remote endpoint. You ship the model with the app (or download it the first time it runs). Latency is bounded by device CPU/GPU, not network round-trips.
- Explicit lifecycle. Hold a
ModelRunner reference; unload() when done. Cloud clients never load anything explicitly.
- Multimodal inputs go in
content array, same as OpenAI. Image and audio parts use the same OpenAI image_url / input_audio wire format.
- Companion files for multimodal models. Vision and audio-capable models need an
mmproj (vision) and/or audio decoder/tokenizer co-located on disk. Manifest-based loading handles this automatically; loadSimpleModel accepts explicit mmprojPath / audioDecoderPath / audioTokenizerPath.
Next steps