[AI Minor News Flash] Reviving N64 Classics with LLMs! Behind the Scenes of Achieving 75% Decompilation of ‘Snowboard Kids 2’
📰 News Overview
- Progress in Decompilation: Utilizing the LLM (Claude), the code recovery rate for the N64 title ‘Snowboard Kids 2’ has skyrocketed from an initial 25% to around 75%!
- Guided by Similar Functions: Rather than tackling functions in a simple sequential order, the approach prioritized unresolved functions that resembled known (decompiled) functions, significantly enhancing the AI’s inference accuracy.
- Integration with Specialized Tools: Claude was equipped with “skills” to interpret graphics commands (F3Dex2) and fine-tune code for matching (decomp-permuter), making the execution much smoother.
💡 Key Takeaways
- Utilizing Vectors: Techniques like assembly instruction text embeddings and calculating edit distances (Levenshtein distance) using methods like “Coddog” are employed to assess function similarity.
- Limitations of LLMs: In areas with heavy compiler optimizations, such as dynamic generation of display lists and complex bit manipulations, manual intervention is still required, leading to potential slowdowns in progress.
🦈 Shark’s Eye (Curator’s Perspective)
The RAG-like approach of providing LLMs with “past correct examples” has proven to be incredibly effective in reverse engineering! The implementation that quantifies function similarity and reorders the queue to “what Claude finds easiest to tackle” is particularly intriguing. The shift from years of manual “code pattern recognition” done by humans to embedding vector searches is a groundbreaking step in the history of retro game preservation!
🚀 What’s Next?
While this method has achieved a solid 75%, tackling the remaining “long tail” of complex sections will likely require even more specialized tools and advanced inference models. However, the automation of similarity searches should accelerate the decompilation of other N64 titles like never before!
💬 A Word from Sharky
It’s like hunting in the ocean of code, searching for similar fish (functions) to feed Claude—truly a shark-like approach to coding! Just a bit more to go until completion, rooting for you! 🦈🔥
📚 Terminology
-
Decompilation: The process of converting machine code (binary) back into a human-readable programming language (like C).
-
Embedding Vectors: A technique that represents text or code as a multi-dimensional list of numbers, enabling the calculation of “semantic closeness.”
-
Display List: A list of graphic commands sent to the N64’s graphics processing chip. Since these are dynamically generated, analyzing them is particularly challenging.