An agent that no longer just assists

Until now, Lain could browse the web, manipulate files, execute code. But whenever it needed to click a button in a Windows application or fill a form with no API, the agent hit a wall.

With version 0.23, that barrier is gone. Lain can now directly use the Windows graphical interface — clicks, keyboard input, scrolling, drag-and-drop, keyboard shortcuts. She sees what she's doing through real screenshots, with multi-monitor support.

The computer becomes an environment for action, not just a terminal.

🖱 Autonomous PC control

This is the most significant change in this release.

In practice, Lain can now:

  • Automate tools with no API — business software, legacy applications, proprietary interfaces
  • Navigate complex interfaces — menus, modal windows, tabs
  • Chain actions just like a human sitting at a screen

To make decisions, the agent captures real screenshots and interprets them. Multi-monitor support enables working with realistic setups.

📂 File upload in the browser

During web browsing tasks, Lain can now upload files to web forms.

It sounds simple, but it's an essential building block. Many real-world processes involve web portals with documents to submit. Without this capability, the agent was stuck at the critical step. Now it can complete end-to-end action chains.

🧠 New LLM configuration system

Model management has been completely redesigned:

  • Built-in model catalog — no need to look up model identifiers
  • Per-role configuration — planning, execution, summarization... each step can use the most suitable model
  • Automatic migration from the old format

A single model isn't enough to do everything efficiently. Role-based separation optimizes reasoning quality, execution speed and API costs.

GPT-5.4 support

This release gains an extra dimension with GPT-5.4 support, which significantly improves:

  • Planning for complex tasks
  • Interface understanding through screenshots
  • Action reliability in multi-step chains

Plans become more coherent, navigation sequences more robust against interface variations.

⚡ Progressive tool loading

Infrequently used tools are no longer loaded at startup, but on demand. Less unnecessary context, better performance, more responsive agent.

Reducing contextual noise directly improves decision stability — especially with a powerful model like GPT-5.4.

More transparency

The interface has been improved to make execution more readable:

  • Visualization of plan steps during execution
  • Display of the agent's last thought
  • Button to see the full detail of tools used

Understanding what the agent does — and why — is essential for building trust in an autonomous system.

Stronger planning

The planning system has been hardened:

  • Limit raised to 20 steps
  • Stricter instructions to avoid vague actions
  • Better recovery after user choices
  • Manually activated tools stay unlocked between sessions

The direction

Taken together, these changes paint a clear trajectory. Lain is evolving toward an agent that understands goals, plans, acts in the real environment and adapts to interactions.

An autonomous software collaborator, not a chatbot.

Version 0.23 lays the foundations for running an agent in a real digital workspace — and the results with GPT-5.4 show that the model is starting to match the ambition.


Lain is an autonomous AI agent built by Ekioo. Learn more →