Library of Words

What is the Library of Words?
Why did you make it?
How did you make it?
How do I use the website?
How can I contribute?
Who are you?

The Library of Words is a digital collection of pages filled with every possible combination of 320 words in the English language. The library starts with a page containing the single first word "a" and finishes with a page containing the last word "zyzzyvas", repeated 320 times. The dictionary used in the library contains 443437 words from the English language. This means that every book, thought, love story, news tragedy, war, biography, scientific discovery or truth about the universe which has ever been written with those English words, or is yet to be written, is already present in this library.

The concept of the Library of Words is based on the short story La biblioteca de Babel (Library of Babel) by Argentinian author and librarian Jorge Luis Borges. In the book, the library consisted of repeated adjacent hexagonal rooms with shelves on four walls, containing books filled with every possible combination of 29 characters (26 letters plus period, comma and space). In 2015, Jonathan Basile created a digital version of the library (libraryofbabel.info), using a base-29 conversion system and a pseudo-random number generator to link a hypothetical book location with the text in the book. In the Library of Words I revisited Borges' idea and used a similar system to Basile's algorithm to produce its pages.

A number of figures which appear in Borges' book La biblioteca de Babel are the "Purifiers". They are librarians that endlessly walk around the library, destroying books that contain gibberish in a cult-like behavior. Given the sheer size of the library, there is an unimaginable number of books with nonsensical text in them, making the task of "Purifying" the library impossible. I was intrigued by this idea and I started to wonder if there was an easier way to create a subset of the library, which would only contain intelligible text. That is how I came up with the idea of the Library of Words.

You can see the library as a place to do research, find inspiration while writing, or contemplate the weird idea that every piece of writing of 320 words is contained in it. You can look around or search for specific text. It is hard to explore the Library of Babel and bump into an interesting page. Hopefully, this should be easier in the Library of Words. Given the size of the vocabulary used in this library, you will come across a high number of new and rare words. You will notice that most of the text is mostly composed of these words and has very little conjunctions, making it hard to read and often lacking sense. This is due to the Zipf-ian nature of human language and I doubt that a further "purification" is in fact possible. This library is not an attempt to emulate English language, but it does give some interesting insights on its characteristics. If you are interested in natural language generators, have a look at my other project here, or lookup Markov chain text generators.

The size of the library amounts to roughly 10^1776 pages. That is an improvement over the size of the Library of Babel (about 10^4677 books), but it is still a number so incredibly large that it goes beyond human comprehension. Storing the library would require 10^1761 exabytes of space. In comparison, the whole of humankind is currently able to store 295 exabytes of information. It gets better: there are roughly 10^80 atoms in the known, observable universe. How is the library even possible, then?

Each page of the library is created on the spot by an algorithm which allocates a unique location string to the text of the page. The two are interconnected by mathematics and so are pre-determined. This is not to be confused with randomness. The library is not random, nor randomly generated on the spot. Every page has a unique location and the exact same page can be retrieved, knowing the location string, making it a deterministic process. This means that the text is immutable and, knowing its exact location, it will always be possible to retrieve the text. The connection between this string and the page is a base-conversion. While Basile's library uses a base-29 conversion, this library uses a base-443437 conversion to convert a base-62 string of text (i.e. the string location, made of characters and numbers) into words. The code is written in Python. Unlike the digital Library of Babel, I decided not to randomize the output of the algorithm. This allows for an alphabetically-ordered library. I believe that this gives a better feel for the library's size while exploring it. The changes are easier to track when moving page by page (or 10^100 pages at a time, by using the navigation bar). The only element of randomness allowed in the algorithm is in the word search. Searching for less than 320 words at a time means looking through a huge number of different pages, containing those words. In order to make it easier to explore, a page at random, that contains those words, is picked during the search.

There are three main ways of using the website.

Random: by clicking on the logo in the main page, a random page of the library will be opened. You can then explore the nearby pages (first page, 100^100 pages backwards, previous page, random page, next page, 100^100 pages forward, last page) by using the navigation bar on the top. A unique link is also provided at the bottom of the page, with sharing buttons to divulgate your findings.

Search: you can look up a sentence containing a combination of the 443437 English words present in the dictionary. Symbols will be stripped off the input and missing words will be ignored. There are two options for the search: "full page" or "string only". The "full page" option will return a randomly picked 320 words page with your sentence in it. The "string only" option will return the location of your exact match. You can lookup everything you want in the library: the story of your life, the description of your work, and even this very same sentence. Remember that everything is in the library. The truth, but also its opposite.

Browse: if you know the string location of a piece of text, you can input the string location here and the page will be retrieved for you. The "full page" and "string only" options are also present in the browsing mode. The "full page" option will retrieve a randomly picked page containing the text decoded by the location string, while the "string only" option retrieves the exact match. An interesting way to explore the library is to input a sentence as a string of text as a location string and analyze the output.

You can contribute to the maintenance of the library in many different ways: have a look at the code (or fork it!) on GitHub, join the discussion on the Facebook page, send comments, suggestions and bugs at wordlibrarianshare your findings and donate at this bitcoin address:

1K15jhzCToYLbsbu4y6G1dH43fPUmYSGSF

I am Giulio Pepe, a doctoral student in Physics. I enjoy programming in my spare time and have been deeply interested by the interdisciplinary nature of this project, spanning mathematics, informatics, art and language.