3 min read
[AI Minor News]

Conversing with 19th Century Knowledge? Meet the Victorian AI "Mr. Chatterbox"!


\'- 19th Century Exclusive Training Data: Utilizes only 28,035 public domain books from the British Library published between 1837 and 1899...\'

※この記事はアフィリエイト広告を含みます

Conversing with 19th Century Knowledge? Meet the Victorian AI “Mr. Chatterbox”!

📰 News Overview

  • 19th Century Exclusive Training Data: Trained exclusively on 28,035 public domain books from the British Library published between 1837 and 1899.
  • Completely Clean Dataset: Contains no information from after 1899, with vocabulary and ideas rooted in 19th-century literature.
  • Small Parameter Count: Composed of about 340 million parameters, similar to GPT-2 Medium, with a lightweight model size of around 2.05GB.

💡 Key Points

🦈 Shark’s Eye (Curator’s Perspective)

This project serves as a witty and challenging response to the “data rights issues” plaguing the current AI landscape!

The implementation of spinning the British Library archive with “nanochat” is impressively specific. Since it carries no memories of anything post-1899, even if you chat about smartphones, it won’t get it—its vocabulary is stuck in the “gentlemen and ladies” era, which is rock and roll in its own right! Simon Willison’s initiative to create a plugin that allows this model to run locally in no time using Claude Code highlights the rapid pace of modern AI development—a point that shouldn’t be overlooked!

🚀 What’s Next?

The potential for an “ethically pristine model” using only public domain data has been showcased. In the future, we might see the emergence of a “time-travel dialogue AI” that perfectly recreates specific historical contexts by integrating even more extensive historical archives.

💬 A Shark’s Take

Becoming a gentleman of the 19th century by tossing out modern knowledge? This shark feels like donning a top hat! Let’s enjoy some elegant conversation! 🦈🎩

📚 Terminology

  • Public Domain: Works whose copyrights have expired or been waived, allowing anyone to use or modify them freely.

  • Chinchilla’s Law: A principle deducing the optimal amount of training data tokens relative to the number of parameters in AI models, serving as a benchmark for efficient training.

  • Markov Chain: A probabilistic model where the probability of the next event depends only on the current state, often used for simple text generation.

  • Source: Mr. Chatterbox is a Victorian-era ethically trained model

🦈 はるサメ厳選!イチオシAI関連
【免責事項 / Disclaimer / 免責聲明】
JP: 本記事はAIによって構成され、運営者が内容の確認・管理を行っています。情報の正確性は保証せず、外部サイトのコンテンツには一切の責任を負いません。
EN: This article was structured by AI and is verified and managed by the operator. Accuracy is not guaranteed, and we assume no responsibility for external content.
ZH: 本文由AI構建,並由運營者進行內容確認與管理。不保證準確性,也不對外部網站的內容承擔任何責任。
🦈