My side project failed. How it worked and why it didn’t succeed

A few months ago, I released a site called Feldot. It was a novel website discovery application that has unfortunately failed to grow. Because it failed, I’ve decided to discuss its inception, design choices, and why I think it failed.

Eight months ago now, I created a toy called Randomsite, blog post here. I used a port scanner to find random web servers on the internet that I stuck in a sqlite database. I learned the very basics of a web application framework called Django, just enough to get something functional to redirect sites, and I set the site up so that people were redirected to random web servers when they clicked the link.

After posting the site on Hacker News and getting reupped by dang(thanks btw), the post took off and I ended up with thousands of people and bots checking out my site. I remember monitoring the tail of the nginx access logs and watching connections fly by, a sense of mild euphoria flowing through my veins. I had created plenty of software, I’d created various tools and toys since I was 14. However, this was the first time people actually used something I created, and it made the entire experience 100x more enjoyable. It ignited something in me.

Not only did it get seen, but I got feedback on what needed to be better. Originally, the software added servers to the database that returned error messages, enough that people complained. With that feedback, I removed many of those sites and immediately saw a huge spike in errors from my site. I broke something! I frantically scrambled, fixed a few bugs in my code(it involved “random” selection from the database that assumed there weren’t gaps in the id field), and drastically improved the experience by using user feedback. The entire experience was incredible, and I wanted to do it again.

I thought for a long time about how to make novel website discovery interesting, social, unique, and better than what I already had made. I decided I wanted a reddit-like site, but instead of having posts link to URLs like reddit.com/r/funny, they would instead link to only domain names like reddit.com. An issue here is the difficulty of new-site discovery. I knew I had to combine the site with a tool that made new sites easy to find. I got to work.

A big issue with the old site as it was is that the end users were connecting directly to an IP address, their request lacking a URL. Most web servers require a URL to be passed to the web server. Nginx, the web server I used, has Name-based virtual hosting, and other web servers have something similar. This configuration allows for many websites to be hosted on a single IP address, saving on cost and better utilizing computer resources. Since my toy excluded server names, it excluded every single website in this configuration, which is most. To find most websites, I needed access to the zone files, a list of all registered domain names in the world. I had to jump through some hoops and cut through red tape, but after a few weeks, I ended up with the ICANN zone files for the most popular(in the US) zone files, including all .com names. The fun work could begin.

I randomized and filtered the zone file data so that only the URLs were placed into a sqlite database. I then made a series of python and shell scripts that pulled the next line of the database and queried for a web server on the URL. If it responded, didn’t return an error, didn’t have a bunch of numbers in the URL, and passed through several other filters, I saved its url, IP address and first 100 bytes of html data to another database.

After several thousand sites were saved, I noticed an issue on review. There were a lot of spam sites that needed to be filtered out, but exact match and delete scripts didn’t work because there were small variations between a lot of different sites. Long story short, I used a python module called difflib to do a fuzzy comparison to other known spam sites and deleted them if they were within a certain threshold, and as this comparison was computationally expensive, I parallelized the computation using the multiprocessing module so that it wouldn’t take a few weeks to complete.

Eventually I ended up with a list of 100,000 interesting sites with good enough signal-to-noise ratio to get the site started. The database with 100,000 sites was put onto the postgresql database, and the explore section of the site was read sequentially. That just left the reddit-like front page.

I won’t go too much into the creation of the reddit-like front page. The most recent 10,000 posts are calculated every now and then, calculations are done based on a function of time, up/downvotes, and moderator inputs, and the order of the sites is cached with Redis. Loading the front page queries Redis, with an offset based on what page you are on. There are plenty of posts on how reddit was created, feel free to check those out.

The site was posted, and the site took off about as well as a rock takes off into space. Looking at the site, it isn’t surprising. Making the site, I tried to focus entirely on UX, making the site simple and to use on mobile, making it extremely fast to respond. Looking back, I think more of a focus on appearance and UI would have done some good.

This site depends heavily on network effects: There needs to be enough people posting interesting content to keep the site self-sustaining. Ultimately, this site did not get there. I take solace in the fact that while the site didn’t take off, I learned enough in the process that I could finally make a long, rambling blog post of my own.

Random Web Server App

When I was younger and had just learned to use nmap, it was a hobby of mine to scan the internet and browse random web servers. At the time, I used very aggressive scans( nmap -A -iR 500 –top-ports 30 –open ) and found some incredibly interesting servers(It is amazing how many computers on the internet are blatantly compromised). When I first started doing it, it was a completely manual process: I would start the scan, scroll through pages of IP addresses and port information, and manually copy and paste interesting IPs and ports into Firefox. Eventually I automated the process using a shell script and some python that scanned and filtered IP addresses, noted interesting ports, and automatically opened a browser to the IP.

Between a couple dozen OS installs, this script was lost. To remake this experience, I made randomsite.lhackworth.com. It’s some spaghetti code slapped together using nmap, Django, uwsgi, and Nginx that does something similar to what I used to do, albeit less noisily. It scans IPV4 addresses for an open port 80 and sticks seemingly good IP addresses into a database. Then when you go to randomsite.hackworth.com/go/, it redirects you to a random IPV4 web server.

It does not aggressively scan like the old scripts did, it only scans for servers at port 80. Unfortunately, since port 80 is the correct port for web servers, many of these sites are much less interesting than the sorts of things I found back in the day. At the request of a few users on Hacker News, I implemented a filter to prune out sites with certain errors like 400, 404, 500, and others, which greatly improved the signal-to-noise ratio of the redirects. It was fun to cobble together. This site is now unmaintained, and the site pruning software has now been disabled. If you want specifics on how I did it, email me at contact@lhackworth.com.

On Frugality

I am an incredibly frugal man. I drive a 2005 Toyota Corolla with over 180,000 miles on it. I have my wife cut my hair instead of paying to get it cut. At home, I regularly wear shirts that I had in high school. To most people, this sounds incredibly odd, but I do this because frugality brings a freedom that needless spending doesn’t.

With this said, I am not cheap. If a purchase warrants it, I will gladly spend 3x as much as a cheaper alternative. When I make a purchase, I almost always consider at least four things: Cost over time, opportunity cost, risk, and return on investment.

Cost over time is seen in Sam Vimes Theory of Economic Injustice, which discusses how someone buying a good quality pair of leather boots often pays less over time than someone who buys cheap shoes that break down in a few months, requiring repeated purchases. I cringe at buying brand-name T-shirts, fancy utensils, or decorations, yet I will happily pay several hundred dollars for leather shoes, put forth large sums of money towards vehicle maintenance instead of ignoring issues, and pay for gym equipment to use at home instead of paying for a monthly subscription that adds up over time.

Opportunity cost is an important consideration in my purchases as well. I have interests in all sorts of topics, and am cyclically obsessed with topics ranging from microcontrollers to psychiatric medications to the evolution of eels. I could easily drop hundreds or thousands of dollars into these obsessions and turn them into full-fledged hobbies, but I choose not to; I choose instead to try to focus on productive or potentially productive ventures.

Return on investment and risk go hand in hand. Going to a top university in an expensive city would almost definitely provide a fantastic return on investment, but it would be fraught with considerable risk: Risk of not completing university, of failing to gain the social capital required to shine, of not being able to escape the financial snowball that is large student loans. Balancing potential return on investment while managing risk is an ever-evolving skill with me, and one that I shall continue to attempt to refine.

This framework of incorporating frugality in my life has led to an incredible amount of anxiety reduction. I used to be constantly stressed about money, worrying about how I would respond if my scooter broke down, if I cut a limb, or if I got sick. Such stress is toxic, and affected me massively, causing panic attacks and sleepless nights regularly. My stress levels have dropped inversely with my savings account, meaning the more money I’ve collected, the less stressed I’ve become.

It will probably be easy for readers to see what I say and think “Pssh, it’s easy for you to say, you’re rich!” This is true to an effect; while our household income is still below median for the nation, we are still making more than many families making minimum wage. However, it’s important to know that we didn’t start this way. When my wife and I began saving up our emergency fund, we both worked minimum wage jobs while going to college. We obviously couldn’t save much, but it was enough to significantly reduce our stress and prepare for the inevitable expenses that hit everyone.

If you are reading this and don’t have one, please consider building an emergency fund. It will change your life for the better.