Anime-inspired AI general assistant: YuiChan V2
TLDR for V2 updates:
- New Web UI with live avatar
- Huge speed up thanks to LLM upgrades & streaming response
- Intelligent email monitoring & notification
- Better web explorer tools
This blog documents the updates I added to Yui in the last couple of months. If you have not checked out my previous post on Yui V1, check out my previous post:
New UI
After a few updates & bug fixes shipped by the streamlit team, I was finally able to build a fully functional web UI for Yui with streamlit. For a Python monolingual like me with zero frontend experience (besides other streamlit apps), streamlit just makes things so much easier. I use prebuilt components everywhere, including the chat interface, and they just work.
But then I found a generic chat interface like ChatGPT boring, so I decide to give it some personality by adding an interactive live avatar for Yui. Since there is no native Python library for live2d characters, so I wrote my first React/Typescript project (yeah!) that basically wraps an existing React component called pixi-live2d-display into a streamlit component using their custom components template.
I’ve open-sourced the component here: https://github.com/mingxuan-he/streamlit-live2d. This was my very first experience with JS/TS so it had tons of bugs. I believe I have fixed most of them (with the help of my pair-programming buddy Yui). I still see some occasional issues with the size of the live avatar, but I’ll leave them be for now.
p.s. I’m also checking out Vercel’s Generative UI framework as the next generation UI for Yui. My backend code is compatible thanks to the template released by the Langchain team. The only problem is that I need to learn how to write the frontend in Next.js… But the results would look like Perplexity and I have a feeling it might be worth the time.
New speed
One major issue I mentioned in my V1 post was speed. Yui is a multi-agent system comprised of specialized worker agents, the router agent, and the interface agent (Yui), therefore each chat interaction with her requires at least 2–3 LLM calls, plus the time to execute tools.
Previously, each interaction took ~30–40 seconds between text inputs and outputs via telegram, with the worker agents ran on GPT-3.5-turbo and the router & interface agents ran on GPT-4-turbo. Technically this still saved me some time compared to manually digging through my mailbox or reading a whole wikipedia page online, but it didn’t really feel like a conversation, especially for chit-chats.
Previously, I tried Groq Llama2 and Firefunctions-v1, but their function-calling capabilities were not as good, causing my tool executions to fail so I had to stick with GPT-3.5-turbo.
Luckily, we live in 2024 when LLMs evolve extremely fast. After Llama3 came out and annouced native functional calling support, Groq added the 70b model to their inference API. I swapped it in for my worker agents immediately and was very happy with the function-calling quality. After a few modifications to the worker agents prompts, I could get consistent and accurate tool arguments from the agents.
I also upgraded Yui’s main model from GPT4-turbo to GPT4-omni. This also provided some speedups, although didn’t feel any qualitative improvements in reasoning/creativity. That said, I do super look forward to Omni’s native multimodal outputs (especially voice).
Another feature that brought a huge speed up was enabling streaming responses in Yui’s backend langgraph. Basically, streaming allows the first output tokens to be returned while the response is still being generated. To make this happen, I had to make quite a few backend changes including upgrading the entire server’s python version to 3.12, converting all nodes/some tools to asynchronous, etc. I even caught a few bugs in the beta version of the astream_events method in langchain-core, and created a couple of github issues to help the langchain team fix them. Huge thanks to Eugene (eyurtsev) for responding to the issues with lightning speed!
With streaming enabled on the backend, I could send the generated tokens as server-sent events (SSE) via a dedicated fastapi endpoint. The stream of tokens is then consumed in the new streamlit UI by the st.write_stream method, generating a typewriter effect.
The faster LLMs combined with streaming reduced the response latency to ~6 sec when a worker agent is called and ~2 sec when no worker agent is called. This is a much, much better chat experience than before.
New proactive email notifications
Another feature I added was an automated email processing pipeline. The goal is for Yui to monitor my Gmail inbox and message me only when there’s anything worthy of my attention. The pipeline has three parts:
- Filter: first use the Gmail tags to filter out promotion/updates/newsletters, etc. For the rest, let Yui decide whether the email is worthy of my attention.
- Summarize: If the email is important, Yui summarizes the email
- Notify: I use the telegram bot (see previous post) to send me the notifications and summary. I also use NTFY to send a push notification to my phone.
I use a workflow in Make.com to monitor my inbox, and sends a web hook to Yui’s server whenever I get an email in the ‘important’ category. (Yet another dedicated endpoint)
There are many options for workflow automation tools like Make, n8n, Zapier. I just went with Make because it was free for light use.
New web explorer tools
Previously, I equipped Yui with three sources of web search tools: Tavily search, Wikipedia, and a web document loader from langchain (scrapes the html using bs4).
I quickly found out that html is a terrible format for LLMs to read and often exceeds the context windows for OpenAI models. After some research, I found Firecrawl and Jina reader, both offering web scraper APIs that return nicely formatted markdown text. Jina is completely free while firecrawl has a free quota of 500 webpages per month. I tested both scrapers on my personal website, but Jina failed so I went with firecrawl. (Is this how people make library/API choices?)
Things I’m working on/testing
- Structured, self-generated long-term memory: currently Yui’s long-term memory is stored as plain unstructured text, passed into the system prompt. I’m working on a module that dynamically generate memory snippets similar to the new Memory function in OpenAI’s ChatGPT.
- In addition to webpage scraping, I’m also planning to add a pdf scraper using LlamaParse by LlamaIndex. This would be primarily used to read academic papers from arxiv, etc.
- I’m waiting on OpenAI to release the audio input/output endpoint for GPT4o in their API. I don’t know how langchain would implement this, but I’m looking forward to try it with Yui.