Mimical-Sovereign · Master Architecture Document · v2.0

MIMICAL-SOVEREIGN

The automation of your own highest standard of discipline.
Version: 2.0 DRAFT · June 2026
Hardware: Framework Desktop × 2 · AMD Ryzen AI Max+ 395
Classification: Personal / Sovereign / On-Premises
Status: PRE-BUILD · HARDWARE IN HAND
Primary Mission
Digital Mimic · Personal Sovereignty
Core Principle
Data never leaves perimeter
Learning Model
Continuous · Compounding · Anti-Sycophantic
Escalation Policy
Local first · Cloud last resort · Always logged
Open Licenses
DeepSeek V4 (MIT) · Gemma 4 (Apache 2.0)
01 Hardware Cluster — Physical Architecture
MIMICAL-1 — PRIMARY
Framework Desktop · AMD Ryzen AI Max+ 395
ORCHESTRATOR
WINDOWS VM HOST
CPU — Zen 5
16 Cores / 32 Threads
KAIROS daemon · OS scheduling
iGPU — RDNA 3.5
Radeon 8060S
~60 TFLOPS FP16 · DeepSeek V4
NPU — XDNA 2
50 TOPS
Gemma E4B classifier · Embeddings · Whisper
Unified RAM
128 GB LPDDR5X-8000
273 GB/s · iGPU+CPU shared pool
NVMe M.2
4 TB PCIe 4.0
V4 overflow · On-disk KV cache
RTX eGPU
16 GB GDDR7
via OCuLink · VFIO → Windows VM
DPU — NVIDIA BlueField-2 BF2M515A (PCIe x4→x16 riser)
200GbE QSFP56 · 8× ARM A72 · DOCA Linux MCP DATA SPECIALIST · Postgres · Qdrant · NVMe-oF · Zero-trust enforcement
Windows 10 Pro VM (VFIO passthrough): PowerDirector video editing · RTX CUDA inference · MCP Windows executor · Full GPU bandwidth, zero host interference
Direct DAC
QSFP56
200 GbE RDMA
RoCE v2
~5–10 μs
latency
MIMICAL-2 — INFERENCE
Framework Desktop · AMD Ryzen AI Max+ 395
DAILY DRIVER
KAIROS HOST
CPU — Zen 5
16 Cores / 32 Threads
LiteLLM Router · Qdrant DB
iGPU — RDNA 3.5
Radeon 8060S
Gemma 4 26B — ALWAYS WARM
NPU — XDNA 2
50 TOPS
Embedding generation · parallel prefetch
Unified RAM
128 GB LPDDR5X-8000
Gemma resident · V4-Pro shard
NVMe M.2 #1
4 TB PCIe 4.0
OS · V4-Pro weights · Qdrant index
NVMe M.2 #2
4 TB PCIe 4.0
Engram cold tier · Kopia backups
DPU — NVIDIA BlueField-2 BF2M515A (PCIe x4→x16 riser)
200GbE QSFP56 · 8× ARM A72 · DOCA Linux MCP API SPECIALIST · LiteLLM routing · OpenWebUI host · Cloud egress policy
Pure inference node: No Windows VM · Full 128 GB + both M.2 slots dedicated to inference 24/7. PXE-provisioned via Ansible AWX — zero-touch deployment.
Total Unified RAM
256 GB
LPDDR5X-8000 combined
Total NVMe Storage
12 TB
PCIe 4.0 · 3 × 4 TB
iGPU Compute
~120 TF
2 × Radeon 8060S FP16
NPU Compute
100 TOPS
2 × XDNA 2 INT8
RDMA Fabric
200 GbE
RoCE v2 · ~5 μs
DPU ARM Cores
16 Cores
2 × BF2M515A A72
02 AI Inference Routing — Confidence-Gated Hierarchy
Step 1 · NPU
NPU Pre-Assess
Gemma 4 E4B
Intent classify · Domain tag
Class A / B / C dispatch
~instant · 0 iGPU load
Step 2 · System 1
Gemma 4 26B MoE
mimical-2 · always warm
3.8B active params
55–75 t/s
≥ 0.85 conf → done
Step 3 · System 1.5
Gemma 4 31B Dense
mimical-1 or mimical-2
All 31B active
25–35 t/s
≥ 0.85 conf → done
Step 4 · System 2
DeepSeek V4-Flash
mimical-1 primary
13B active / 284B total
8–18 t/s · 1M ctx
≥ 0.85 conf → done
Step 5 · System 2 Max
DeepSeek V4-Pro
Both nodes · TP=2
49B active / 1.6T total
3–6 t/s · 1M ctx
last local resort
Step 6 · Cloud
External API
Claude / Gemini / GPT
via LiteLLM · DPU egress
conf < 0.85 only
Full response crystallized → Postgres
Confidence Gate: Each tier returns an answer and a score (0.0–1.0). Score ≥ 0.85 delivers immediately. Below threshold escalates up the stack.
Crystallization: Every external API answer — prompt, token count, model, raw response — is serialized as JSONB to Postgres. The same class of problem never requires an external call twice.
Sovereignty Guarantee: Cloud escalation is physically gated through the mimical-2 DPU egress policy. No query reaches external APIs without passing through the BlueField-2 firewall.
03 AI Model Roster — Resident Inference Stack
System 1 · Daily Driver
Gemma 4 26B MoE (A3.8B)
FP8/BF16 · ~14 GB VRAM · 3.8B active params per token
55–75 t/s · always warm · never evicted from GTT pool
KAIROS observation loops, tool dispatch, routing decisions, conversational chat, consensus scoring. Handles ~80% of all daily requests locally.
System 2 · Deep Reasoner
DeepSeek V4-Flash
MXFP4 · ~142 GB total weights · 13B active per token
8–18 t/s · on-disk KV cache · 1M token context window
Hard problems, long-horizon agentic sessions, Engram-aware tiered memory loader. CSA/HCA attention architecture.
System 2 Max · Frontier Local
DeepSeek V4-Pro (TP=2)
MXFP4 · ~800 GB total · 49B active · distributed both nodes
3–6 t/s · RDMA tensor parallel · 1M token context
Maximum local reasoning. Weights split across mimical-1 and mimical-2 via 200GbE RDMA fabric. Last resort before cloud escalation.
04 Memory Allocation — mimical-1 (128 GB)
RegionSizePurpose & Notes
amdgpu GTT pool 120 GB amdgpu.gttsize=131072 in GRUB. Holds V4-Flash hot experts + KV cache.
CPU / OS / KAIROS 8 GB Ubuntu 26.04 LTS kernel, Go daemon, Postgres, system headroom.
NVMe Overflow ~37 GB Cold MoE experts + on-disk KV cache for 1M token contexts via mmap/io_uring.
Critical Kernel Parameters (GRUB): iommu=pt amdgpu.gttsize=131072 ttm.pages_limit=32505856 — applied before any other configuration step. Required for iGPU pool access and VFIO passthrough stability.
05 Memory Allocation — mimical-2 (128 GB)
RegionSizePurpose & Notes
amdgpu GTT pool 120 GB Gemma 4 26B MoE (permanently resident) + 31B Dense on demand.
Gemma 26B MoE ~14 GB Always resident. Primary KAIROS engine. Never evicted.
V4-Pro Shard ~400 GB mimical-2 half of V4-Pro weights. Streamed via llama.cpp RPC over RDMA.
06 OS & Software Stack Architecture
OS Layer
Ubuntu 26.04 LTS (Both Hosts) Kernel 7.0 (in-tree) ROCm 7.2.2 (gfx1151) DOCA 2.x (DPUs) Windows 10 Pro (mimical-1 VM)
Inference
llama.cpp (ROCm build) llama.cpp RPC (RDMA TP=2) vLLM ≥ 0.7.0 Ollama (Windows VM) Unsloth + FSDP (LoRA fine-tune)
Agentic
KAIROS Go Daemon MCP Servers (DPU-isolated) LiteLLM Router Agent Zero (Docker) Ansible AWX
Data
Postgres (behavioral_events · methodology_rules) Qdrant (RAG vector DB) Tiller Financial Ingest MaaS / petaCMS Telemetry OBD-II Vehicle Telemetry BookStack (auto-updated runbooks)
07 KAIROS — Proactive Observer Daemon
Always-on compiled Go application. Watches without being asked. Speaks when it has something worth saying. Named for the Greek concept of the right moment. It observes, logs, and proposes — but never executes without explicit human approval. The anti-sycophancy gate is immutable at the code level.
Observer 1
Financial Watch
Tiller daily transaction ingestion, burn-rate anomaly detection, debt acceleration modeling, NV Energy demand arbitrage, and bank alert email parsing. Morning briefing generated automatically.
Observer 2
MaaS Fleet Telemetry
petaCMS bare-metal fleet streaming telemetry for cloud hosting customers. Node health trend analysis, Kopia/Velero backup state verification, Ansible AWX playbook queuing. Approve/deny execution gate.
Observer 3
Calendar & Communications
Email triage via Go net/html sanitization, conflict detection, meeting prep briefs, call screening via MacroDroid + NPU conditional forwarding. Zero disruption to existing contacts.
Observer 4
Vehicle & Environment
OBD-II Bluetooth telemetry ingestion, fault code monitoring, maintenance window prediction, solar array output correlation with EV charging schedule optimization.
08 Build Sequence — Phased Engineering Plan
WEEK 1
Phase 1 · Physical Build
Bench test both mainboards. Confirm POST, 128GB recognition per node, Noctua thermal validation. Hardware in hand — begin immediately.
WEEK 1
Phase 2 · OS & iGPU
Install Ubuntu 26.04 LTS. Apply GRUB parameters: iommu=pt amdgpu.gttsize=131072 ttm.pages_limit=32505856. Validate ROCm 7.2.2 with rocm-smi on gfx1151.
WEEK 2
Phase 3 · VFIO + Windows VM
Bind RTX via vfio-pci. Boot Windows 10 Pro VM. Validate PowerDirector GPU access. Confirm zero host memory bandwidth contention during simultaneous LLM inference.
WEEK 3
Phase 4 · Inference Init
Deploy Gemma 4 26B MoE on mimical-1 via llama.cpp ROCm build. Benchmark tokens/sec. Validate always-warm residency in GTT pool.
WEEKS 4–5
Phase 5 · RDMA Cluster
Connect DPUs via DAC cable. Validate 200GbE RDMA throughput. Deploy llama.cpp RPC. Test TP=2 distribution of DeepSeek V4-Flash across both nodes.
WEEK 6
Phase 6 · KAIROS Foundation
Deploy LiteLLM router. Stand up KAIROS Go daemon. Expose core MCP tools on DPUs. Initialize Postgres behavioral_events schema.
WEEK 7
Phase 7 · Routing & Escalation
Three-class dispatch logic. Confidence gating at 0.85 threshold. Cloud API key configuration. Full escalation logging to Postgres JSONB.
WEEK 8
Phase 8 · MaaS & Telemetry
Connect Ansible AWX to KAIROS MCP server. Implement Tiller financial ingest pipeline. NV Energy solar telemetry. OBD-II dongle integration. BookStack auto-documentation active.
09 Mimical-Sovereign — Founding Principles
Personal, Not General
Not a model that knows everything about everyone. A model that knows everything about you. Your history, your fleet topology, your decision methodology, your financial reality.
Compounding Intelligence
The Postgres knowledge graph never resets. Every model generation inherits all accumulated knowledge. When a new model drops, swap the weights — the sovereign continues without interruption.
Data Never Leaves
The BlueField-2 DPUs enforce the network boundary in physical silicon. Financial telemetry, health data, and communications never touch hyperscaler infrastructure. Zero-trust at the hardware level.
Model Agnosticism
Every component treats the LLM as configuration, not code. The agentic loop never knows or cares what model answers. New generation released — config change and weight download. Zero downtime.