For the Handmade Network Wheel Reinvention Jam this year I am looking to investigate alternate methods for URLs, links, and using a browser to navigate between files on the internet. This blog post serves as a short introduction to my thoughts on the topic, with minimal editing or up-front research on the topic or history. As I go through the jam I’d like to do a healthy balance of research, experiment building, and blog writing to present my thoughts so this is more of an exploration rather than an authoritative piece on the subject.
Link to the Jam: https://handmade.network/jam/wheel-reinvention-2024
Link to my Project: https://handmade.network/p/608/orca-links-modern-urls/
The URL Problem
Here's my description of the URL problem, as I understand it (i.e. I haven't done any research into the topic, this is the story I tell myself in my head. I'm sure the reality and history is a lot more complex)
Let's imagine we are at the beginning of the internet. There are like 10 computers connected to each other. If I am using one computer, and I want a file on another computer, I need to somehow distinguish which of the other 9 computers I want to access to get the file. There are essentially two problems that need to get resolved:
I need to know which computer the file is on. Maybe I just used the computer recently, so I want to see and remember the name of the computer I was accessing so I can tell my computer at home later which computer I am talking about. Much more commonly (in the modern world at least) I was told the file existed by some external source. Like maybe a friend told me that there's a cool game on his computer that I should try out. In this case my friend needs to somehow communicate to me the information about his computer so I can access it later
Once I know which computer I am talking about, I need to somehow put some information into my computer and run it through some system that will get me connected to and downloading the right file. Essentially, using the information I have in my head, combined with the information and systems on my computer, I need to reliably (and hopefully without too much effort on my part) get connected to the correct computer.
One way to solve this problem is to give every computer in the world a unique identifier. In this hypothetical past, when there were only 10 computers, we could just label each computer with a number 1, 2, 3, etc. and then we could use that number to uniquely talk about each computer. Maybe my friend owns computer 3, he tells me his computer is 3, and on my computer which is 4, I say I want to connect to computer 3. So my computer looks at every computer connected to it and finds the one that reports itself as computer 3. This is pretty simple, and actually would work great if it weren't for a few fundamental problems:
There are more than 10 computers in a world. Way too many for a human brain to remember each unique number. Too many for my computer to ask every other connected computer what number it is as it tries to find a particular computer.
Humans are not as good at remembering "random" series of numbers as they are at remembering names and words in their own language. I can remember 10 random English words easier than I can 10 random numbers between 0-255, and that's not a fair comparison, because there are more English words than 255. Also some names are more valuable than others, because they are easier to remember, or represent more popular concepts. For example more people may want their computer to be named #1 rather than #2425, because #1 is easier to remember and people have a preference for being #1 in a competitive sense.
People all speak different languages, and it's easier to remember words and names in the language(s) you know. This is especially complex if my friend and I speak different languages. His computer might have a name in his language that is hard for me to remember. We need some system where we can use a translation of his name into a name that I can understand and remember.
Say my friend buys a new computer between telling me about the game and me trying to download it. This new computer might have a different number than his old one, but it's still his computer, and he likely means for me to get the game from his new computer rather than his old one. It would be nice if he could tell the world "I got a new computer. Please use computer 9 now, not computer 3" without having to communicate with me specifically.
Files and information transfer hold value, so someone might want to pretend to be my friend's computer. How can I be sure I got my friends file, not someone else who tried to pretend to be my friend? I somehow need some guarantees that the information my friend told me is enough to confidently route me to his computer, not someone else's.
The file on my friends computer might be in some folder. Once I'm connected to his computer, how do I find the file I am looking for? This is sort of a sub-problem, since my friend could include the information about which folder it's in as he's telling me the ID of his computer, but we'll likely want to treat this as a sub-problem, since it's likely easier to solve the problem of finding the right file after we solve the problem of connecting to the correct computer.
There might be a lot of people that want this file. Maybe my "friend" is actually Taylor Swift, and the "game" is actually a new album she just finished. Millions of people want to download this album but a single computer is not powerful enough to serve all of those people at the exact same time. For a variety of hardware reasons, we can't have a single physical computer that everyone is downloading from. We need to set up a system where the load of transferring the file is distributed amongst a collection of computers that all have the file. Even if everyone is finding the file using the same information that Taylor told them.
Related to #7 Each individual computer may break, or it may go offline for one reason or another. We can't have one computer failure cause the entire system to be inaccessible. We need to have a dynamic system where people get routed to a different computer when the one they were trying to connect to is unavailable
People live all over the world, and downloading a file from France while I live in Canada is not always that fast. It would be nice if my friend could manage a computer that is not physically located at his house, so that people that he can share files quickly with people who do not live close to him.
So we have a bunch of hurdles to overcome. Our system of finding and connecting to computers needs to solve all these problems (or at least attempt to solve as many as possible). The system we currently have solves many of these problems, but it arguably doesn't solve all of them. Part of the reason it doesn't solve them is that it was designed in a time where the scale of the internet we have to today was unimaginable. Some fundamental decisions were made that are probably not the best choice, given what we know now.
So let's start thinking about a new solution to this problem. We'll start with iterating over some basic properties we want our solution to have:
We want our naming scheme we use for each computer to be something that is easy for both humans and computers to work with. This likely means text, since basically every computer people use has some kind of keyboard that allows them to quickly put in characters in their native language.
Given that our naming scheme is textual, we want it to be as versatile as possible. It should work for as many languages as possible, while not sacrificing too much in the way of complexity. So for now lets say we support a larger portion of the unicode characters, but we are still going to choose some small-ish subset. Remember our goal is to balance versatility with other concerns, so allowing every single unicode character is likely going to cause us some grief in terms of complexity later on (for example, imagine if we support all invisible characters in unicode, many completely unique links would look the same to a user)
As a single user of the internet, with a computer, I want to be able to choose a name for my computer. Maybe I don't always get the exact name I want, but I should be able to have a decent amount of control and influence on how my computer gets named. As a large business with a well-known name, I want to be able to choose the name for my host of computers that serve files to our customers. I still might not always get the exact name I want, but I will spend a lot of money and/or effort trying to get a particular name that matches my business name. And I also want to make sure nobody else can try and pretend to be my business and distribute files under my name. Both of these customers need to be served by the system, so we need to balance the concerns of both. The current system we have relies largely on a single source of truth that people will pay money to, and in return will get a name uniquely assigned to them. If two people want the same name, they may negotiate selling the name between themselves. Ultimately, whoever wants that name and has the most money will usually win. In our new system, we'd like these single name multiple customer disputes to get resolved a little more gracefully, and preferably with less of a reliance on deep pockets or a single source of truth. For now, let's say we at least want the decision of who gets what name to be a little more collaborative between all the parties involved, and we'd like the result of these disputes to roughly match who wants the name more (especially since that might change over time)
As a user of a particular name, I want to be sure that a computer I am connecting to for the first time is the one I expect to be finding. This may mean a longer process of connection, like typing in a longer name and then verifying that the name I typed was correct using some sort of feedback and security mechanisms. But once I am sure I connected to the correct computer, I would like to start using a shorter process to reopen that connection. Maybe a shorter name, maybe a name in my own language, maybe even a name that I gave it myself, that has nothing to do with it's globally registered name. I also want to be sure I don't accidentally visit another new computer when I meant to revisit one I had found previously. Ultimately, there needs to be a collaboration between the user and the browser software they are using to establish a naming pattern that they both understand, which may be separate from the naming pattern used for the actual connection behind the scenes. But this collaboration can't be so complicated that it confuses new users, or prevents communication between users about a particular computer (website) they both visit. So let's assume that our naming scheme is going to be paired with a standard way saving information about names on the clients machine, and building up trust over time for particular computers that the user connects to often.
While the single authority aspect is a downside to the current system, I don't think getting entirely away from it is reasonable. We could do something like the block chain where the "authority" is just consensus between machines, but that has it's own problems. For now, let's assume we have a central authority of some kind, but we'd like to make the burden on that authority as light as possible. They shouldn't be required to settle disputes directly between people who want the same name. They should be relied upon to host a secure database and reliable access to all, with a system that allows disputes to get reconciled internally, and over time.
The service that hosts the naming information should also support more information than just the textual name and the IP address(es) of the computer(s) it's associated with. It should be able to store and serve information about the owner, date of registration, image icons that represent the site descriptions, and maybe even analytics about how many people have visited that site. All of this information should be editable by the owner of a name, and easily requestable by the users looking up the IP address associate with a name. This doesn't serve as the backbone of security, so verifying the information isn't the highest priority, but it does serve as the user friendly way to investigate the site and determine if it's the site you are looking for, so the system should be set up in a way that the information is commonly used and expected by users, so the owners of the names are encouraged to provide useful and accurate info when in good faith. Bad actors and spoofing should be prevented/handled by other more technical means of security
With all these things in mind, I feel like I'm starting to see a certain shape form of the kind of system that might work. Again all of this is a thought experiment, before I actually do research and implementation attempts, so much of my understanding may change and these proposed aspects may turn out to be entirely wrong. But hopefully this helps set the stage for trying out a potential solution. I'd be happy to hear any feedback about these ideas, I'm sure I'm not nearly the most knowledgeable person on this topic, so I'm probably wrong on a number of accounts here.