An agent that no longer just assists
Until now, Lain could browse the web, manipulate files, execute code. But whenever it needed to click a button in a Windows application or fill a form with no API, the agent hit a wall.
With version 0.23, that barrier is gone. Lain can now directly use the Windows graphical interface — clicks, keyboard input, scrolling, drag-and-drop, keyboard shortcuts. She sees what she's doing through real screenshots, with multi-monitor support.
The computer becomes an environment for action, not just a terminal.
🖱 Autonomous PC control
This is the most significant change in this release.
In practice, Lain can now:
- Automate tools with no API — business software, legacy applications, proprietary interfaces
- Navigate complex interfaces — menus, modal windows, tabs
- Chain actions just like a human sitting at a screen
To make decisions, the agent captures real screenshots and interprets them. Multi-monitor support enables working with realistic setups.
📂 File upload in the browser
During web browsing tasks, Lain can now upload files to web forms.
It sounds simple, but it's an essential building block. Many real-world processes involve web portals with documents to submit. Without this capability, the agent was stuck at the critical step. Now it can complete end-to-end action chains.
🧠 New LLM configuration system
Model management has been completely redesigned:
- Built-in model catalog — no need to look up model identifiers
- Per-role configuration — planning, execution, summarization... each step can use the most suitable model
- Automatic migration from the old format
A single model isn't enough to do everything efficiently. Role-based separation optimizes reasoning quality, execution speed and API costs.
GPT-5.4 support
This release gains an extra dimension with GPT-5.4 support, which significantly improves:
- Planning for complex tasks
- Interface understanding through screenshots
- Action reliability in multi-step chains
Plans become more coherent, navigation sequences more robust against interface variations.
⚡ Progressive tool loading
Infrequently used tools are no longer loaded at startup, but on demand. Less unnecessary context, better performance, more responsive agent.
Reducing contextual noise directly improves decision stability — especially with a powerful model like GPT-5.4.
More transparency
The interface has been improved to make execution more readable:
- Visualization of plan steps during execution
- Display of the agent's last thought
- Button to see the full detail of tools used
Understanding what the agent does — and why — is essential for building trust in an autonomous system.
Stronger planning
The planning system has been hardened:
- Limit raised to 20 steps
- Stricter instructions to avoid vague actions
- Better recovery after user choices
- Manually activated tools stay unlocked between sessions
The direction
Taken together, these changes paint a clear trajectory. Lain is evolving toward an agent that understands goals, plans, acts in the real environment and adapts to interactions.
An autonomous software collaborator, not a chatbot.
Version 0.23 lays the foundations for running an agent in a real digital workspace — and the results with GPT-5.4 show that the model is starting to match the ambition.
Lain is an autonomous AI agent built by Ekioo. Learn more →