tsune Help

vibe optimization but it's heuristic

Hi there, recently I tried to optimize HPC application with vibe coding for the HPC competition.

While I tested some prompt, I found some interesting characteristics, also found a solution to mitigate those.

At first, as many people say, LLMs are not good at heuristic tasks. i.e., improving NP-hard one, sequential problem. From that point of view, AHC from atcoder and KoH challenges from CTFs prove that.

Conversely, LLMs exel at tasks that involve a goal or target value to be maximized. This has been proved in jeopardy-style CTFs and kaggle.

btw, I recently read a blog post that really resonated with me:

https://www.haibinlaiblog.top/index.php/pre-phd-thinking/

The author is my friend, we've known each other since SuperComputing Asia 2026. He and I, who are at the forefront of the cs competitions, have very similar views.

In this blog, I'll focus on the agent's performance on heuristic, long task.

Also, please note that this blog is my own. What I'll mention is something invention based on my narrow knowledge, may include misunderstandings.

what I feel during agent was working

nightmare of compacting

Have you heard of the Ebbinghaus Forgetting Curve? It's a curve that illustrates the decline in human memory retention. While the Ebbinghaus Forgetting Curve itself is quite self-explanatory, it serves as an suitable example for visualizing human memory.

Anyway, let's compare the memory system between agents' and humans'.

The term “memory” used here does not refer to the context associated with system prompts that span multiple contexts, as provided by OpenAI or Anthropic. It simply refers to historical data that the agent references while operating. This is called “context,” and within this context, the agent calculates token relationships and predicts the output.

From Attention is all you need: https://arxiv.org/pdf/1706.03762

Anyone familiar with the Transformer architecture should understand this much. And, in theory, the context can be extended indefinitely. However, think back to the well-known diagrams of the Transformer architecture. As the context grows, the number of key-value caches referenced during the decoding phase increases linearly.

This is why LLMs has restriction to the context length.

Assume: if LLMs have unlimited context

Let's take another look at the current Transformer model. If we had unlimited context length, how would the "memory" be determined when generating the n-th token?

To start with the basics, the transformer architecture has positional embeddings.

Please refer following figures to remind what positional embedding is in your mind.

Screenshot_20260603_014213.png

Here's the visualized input embedding. (generated by script from IBM's post: https://www.ibm.com/think/topics/positional-encoding)

Screenshot_20260603_014416.png

Anyone realized what I wanna say? In a nutshell, if LLMs have unlimited context length, positional embeddings affect tokens to behave in a manner similar to the Ebbinghaus Forgetting Curve. However, since the divergence of the current positional embedding period occurs very quickly, we need to investigate better hyperparameter or period functions.

Therefore, we can say the current eq of positional embedding as "something like Ebbinghaus Forgetting Curve optimized for limited context".

Back to the real

Well, let's put this fantasy aside for now. However, if a new architecture were to emerge that replaces the Transformer-one that doesn't rely on memory bouded processing and instead represents the expression of relationship between tokens, then the hypothesis I mentioned earlier might become a realistic possibility.

Returning to reality, due to the context length limitations, loger tasks require to use multiple number of contexts. Thus, current agents such as claude code and codex generate a "compact" as request to perform context compaction.

This is truly the nightmare.

When "compact" is executed, the LLM running it compresses the context more than necessary. As a result, context around the task is constantly being updated. And as everyone has experienced, "compact" has become disruptive.

Furthermore, "compact" consumes 5h/weekly limit excessively. This is what I realized during vulnerability audit, launching sub-agents as many as main agents want, and restrict main agent's audit/something consumes context save 5h/weekly limit than working mainly main-agent with several compact. (ofc sub-agents used same model as main-agent's one)

solution1. build sequential task to agents

This is quite easy, easy to invent, force agent to follow the task rules. task rules will be saved in the other context where won't be compacted

The agent will work in sequential, this will mitigate the compacting nightmare.

in example, in the context of hpc application optimization:

  1. profile the target and investigate bottlenecks

  2. investigate how to resolve the issue

  3. patch and rebuild

  4. review the result benchmarking

  5. if it improved something, write your findings into md

solution2. might researchable things

How can we make it more closely resemble human memory? Anyone familiar with the Ebbinghaus Forgetting Curve knows that it not only illustrates memory decay but also suggests how information becomes consolidated as long-term memory.

How can we reflect this in embed vectors? At the very least, improvements to positional encoding are essential. And we must address the issue of the linearly expanding KV cache. Of course, a KV cache is not merely a sequence of vectors; it would be difficult to map it to a different dimension or simply take the dot product or linear combination of the vectors.

The conditions that a solution must satisfy are obvious: it must represent the relationships between tokens within a limited memory, preserve the order of each word, and incorporate vectors—or an attention layer—that allow past tokens to influence the whole. It is obvious that realizing this dream-like architecture is difficult.

As an example of an attempt to solve part of this problem, GPT treats positional embeddings as trainable matrices. This can be interpreted as an attempt to achieve better memory representations through reinforcement learning, and if improving positional embeddings via reinforcement learning is a heuristic approach to a problem for which no solution can be obtained through computation, it could be the optimal method.

Fact I faced on while vibe-optimization

So what matter in real? While vibe-optimization, I faced on following things:

  • ignore the restriction mentioned in initial prompt, i.e., they ignored "do not change accuracy" restriction and showed result me as "perfect optimization with no accuracy destruct".

  • they are overly confident for their findings, i.e., they treated some findings as game-changer even it improved execution time for only 0.04 sec.

This was too annoying. Furthermore, they become fucking slacker when the verification of their findings. Such something slacking off made me frustrating, even they know how they should do to resolve it.

they tend to be fussing over small fact

This likely depends on the specific tasks they were trained on using RL. However, no matter how much we instruct them to focus on global optimization, they remain fixated on algorithmic improvements to specific computational kernels. Furthermore, even when givin strong instructions via prompts, the agents do not attempt to profile the code; if they do not see an improvement in execution time through algorithmic improvements to the computational kernels, they continue to try improve that specific part. Many readers are familiar with Amdahl's Law and should understand that if the right optimization targets aren's chosen, optimization efforts will yield only minor improvements. Midway through the optimization process, I paused the flow, manually created a profiling file using AMDuProf, and then passed that cached data to the agent. However, the agent ignored it and continued to focus on minor algorithmic improvements to the kernel lmao.

Last modified: 07 June 2026