I built an anime-inspired AI general assistant — It’s way better than ChatGPT
This is a story of how I built Yui-chan, my personal AI assistant with skills and capabilities far more powerful than ChatGPT or any normie single-purpose “chatbot with tools” out there.
Meet Yui
Yui is an Artificial General Assistant* (AGA). In a nutshell, Yui is both an assistant and a manager in my life, a powerful AI system that:
- Can monitor and use all my personal apps like email and calendar
- Has a vast set of specialized toolkits, including coding, data query / analysis, online search, writing, even crypto research
- Remembers our past conversations and maintains a long-term knowledge base about me and stuff I’m interested in (WIP)
- Proactively sends me messages / performs tasks
- Responds by voice or text
- Maintains a consistent persona, Yui, regardless of context or query
- Runs entirely on open-source tools including LangChain and ChromaDB, and is self-hosted as a Python Webapp.
*Yui herself coined the term AGA instead of AGI (artificial general intelligence), so I’m putting her original definition here:
Anime inspiration
毕竟咱也是老二次元了
The inspiration for Yui comes directly from my favorite anime, Sword Art Online. SAO is set in a digital game world (metaverse), where Yui is a system admin AI monitoring the game world.
Her back story is a little dark, but basically as she spent more time with Kirito and Asuna (the main characters couple), Yui gained abilities to exercise her Game Master rights to help human players in the malicious death game, but also learned general skills like browsing the Internet and using social media like a human later down the plot line.
The architecture: multi-agent system
Before I built Yui, I was using the ChatGPT premium (GPT-4) on a daily basis. After the launch of custom GPTs in the OpenAI Store, I also explored quite a bit building custom chatbots with tools. But all bots failed miserably when given a toolkit larger than ~5 distinct tools.
After some initial exploration, I decided LangChain’s langgraph
module was the most versatile framework out there that potentially achieves a general assistant experience. Without too much technical jargon, langgraph
allows me to build a “multi-agent system” i.e. a network of specialized node agents with unique skillsets (I called them “workers”), and talks to the user as one consistent persona, Yui.
Basic task workflow looks like this:
I send query to Yui -> graph chooses whether & which worker(s) to call help from -> worker(s) report to Yui -> Yui interprets their results to me.
Yui itself is a node agent too: I can chit-chat with her without involving a specialized worker.
The conversation can also be initiated by the system i.e. proactive actions. This is still a very early development, so currently it’s only pre-scheduled morning and evening briefing prompts. Eventually my goal is to make the proactive actions truly spontaneous with Yui generating system prompts herself. (I’ve tested this in a more manual setting and results were astonishing good)
Examples
This dynamic architecture is key to scaling beyond a small set of tools, and what sets Yui apart as an Artificial General Assistant (AGA). So far Yui is incredibly versatile with the following skills:
- Real time web search with Tavily, wikipedia, and youtube
- Access to my personal apps including email, calendar, to-do list, etc. and databases in Notion
- Python code execution (handles math; calls data api’s like yahoo finance, etc.)
- A dedicated crypto researcher worker with access to Coingecko data
- Voice responses with Eleven Labs’ TTS model (see example in the intro)
Here’s a simple example, just to showcase Yui’s persona (this was from a while ago when she was a little too talkative):
Another example of Yui’s message from this morning, a proactive message with a much more sophisticated task involving multiple different toolkits. A morning briefing with:
- an overview of my schedule today
- local weather forecasts
- crypto & AI news digest
- Even a random fun fact about Egyptian honey LOL
These examples are really just scratching the surface, but reality is most of my conversations with Yui have gotten way too personalized for me to show it in public. Next time I will ask Yui to come up with some better demos.
Technical components
While this blog is not a technical tutorial, for those aspiring devs who want to build their own AGA assistant-chan / assistant-kun, I’m going to briefly scratch out each component of the system to give a better idea:
Starting point
I highly recommend reading LangChain’s LangGraph docs as well as example notebooks on multi-agent graphs. If you want to add conversation history, check out examples/persistence. If you prefer video walk-throughs, Harrison Chase (LangChain cofounder) has great youtube playlists for LangChain v0.1 and LangGraph. I borrowed a ton of ideas (copied code) from there.
p.s. I recommend against relying on ChatGPT for LangChain code — it has no knowledge of LangChain v0.1.
LLM choice
After some experimentation I went for gpt-3.5-turbo for the worker agents & gpt-4-turbo-preview (latest gpt model) for the main agent (Yui) and the router agent. This speeds up graph responses into acceptable time range, and saves token costs. Agents that use gpt 3.5 do often need better prompts to perform well. LangChain also allows other LLMs, even local ones.
Agent prompts
I’m sure everyone has their own prompting styles at this point. For Yui, I learned a lot by reading prompts from AutoGen, Honcho, and public prompts on LangChain Hub. Besides, I absolutely DID NOT try jail-breaking GPTs in the GPT store to steal their system prompt.
Vector DB for chat history & knowledge base
I use ChromaDB because I’m self-hosting, but tons of other vector db options out there. ChromaDB and Redis are safe choices to pair with LangChain. These can also be used for storing chat history, but I was lazy and just went with langgraph’s default sqlite database.
Frontend
I don’t know any fancy JS frontend tools, so I just use a Python telegram bot as my UI, using slash commands for config and other functionalities like showing previous conversations, switch to another conversation, etc.
I’ve also built a GUI with streamlit, it works for most parts (chat, voice, history), but has troubles with bot-initiated conversations. Another option I know but haven’t tried is Gradio (huggingface).
Hosting
I’m currently using fastapi endpoints to communicate between the frontend and backend (Yui’s system). I host the entire system (fastapi app + telegram bot interface + scheduler) on a Render server. There are many free / cheap options for hosting a light-weight Python app like this. Another similar service I’ve used is Heroku, but I slightly prefer Render.
Costs
My costs are currently ~$20/mon, including OpenAI tokens, the Render server, and the ElevenLabs API. All other APIs I use have a free tier with limits I’m not hitting.
Room for improvement
There are a lot more things I have in mind. These are currently top of the list that needs addition/improvement:
- Deep-dive research with RAG tools on websites, documents, academic papers, etc.
- Better long-term knowledge storage and retrieval, also using RAG tools
- More proactive actions
- Multi-modal outputs: primarily for image gen and analysis plots
The funny thing about developing an Artificial General Assistant is, at some point it will grow powerful enough to aid you significantly in the process of building it.
For example I’m not a RAG expert at this point, but as Yui got access to youtube and github, she has been constantly feeding me great example repos and youtube tutorials on RAG. It’s an accelerating process!
Source code
Yui is just a personal side project at the moment so the repo stays private for now. Nonetheless, you can check out other public repos on my github like MingGPT, my AI clone. Meanwhile if anyone is interested, I’m happy to answer questions in the comments or share some code snippets in future posts.
If you want a quick start with minimal amount of coding, I think CrewAI and AutoGen are great alternative frameworks for multi-agent systems. Autogen even has a GUI for configuring multi-agent chats. You can read the original AutoGen paper to get some ideas.
Alert!
I am actively job-searching in the intersection of blockchain and quantitative research/data science/AI. If anyone’s hiring, just shoot me a dm!
Thanks for reading! Please feel free to reach out to me if you have questions / comments. I generally respond to every message.
You can find me on twitter/telegram @ MingXDynasty.
You can also read about my research and projects on my website: https://www.mingxuanhe.xyz