Like most people, we’ve been pretty busy doing things in life that need to get done. We worked really hard to get some Geocities preservation work done. The site closes down tomorrow. We’re not happy with what we got preserved, even as we are. Doubled edged sword that.
We did a manual look through and found information on about 5,000 stories archived on Geocities. We screencapped and created articles about over 500 sites. We added definitions from around 50 pages on Geocities. In the final days, we created an extension so that people could look at the page and fill out the form, updating wiki articles about the site.
And in the past five days, we really kicked it into over drive. We extracted information about 9,000 fansites mentioned on DMOZ. We screencapped about 5,000 of those pages which contain related meta data. We screencapped another 500 or so pages based on Google search results. We downloaded about 1,000 text files related to fandom. We saved about 10,000 search results from Google that mentioned fandom related terms on pages hosted on Geocities. Some of this information is just garbage. Early SEO efforts used random keyword seeding on the bottom of pages and that still pulls up on search, especially 500 deep. Some of the screencaps are undoubtedly 505 errors. Others, especially ones based on Google searches, are probably not fandom related. Lots and lots of potential garbage sorted in with potentially useful information.
The problem now is: What do we do with this data? The screencaps, the google search results, the DMOZ information? How do we sort through it, cull through it, put it on the wiki? Do we just mass upload everything and sort the potential garbage out later? Do we just slowly try to work out things now?
We’re looking for ideas on how to handle that. We’re also looking for assistance in implementing any ideas. Any help you can provide us with post Geocities closing is most welcome.
