In SEO, there is (and will most likely always be) an ongoing debate on the ethics surrounding paid inclusion in web directories. There are a number of fair reservations a company looking into optimization may have regarding this standard practice in the search engine optimization space, but there are also generally very fair replies to these as well. For instance, in many cases, payment is not for inclusion in the directory; it is for express review. Every submitted link is examined by a human and checked for legitimacy. More importantly, the site is evaluated for its relevance to the category to which it was submitted.
This human review is valuable and worth compensation in and of itself. In the field of artificial intelligence, the human task of arranging data in a machine-readable manner and ensuring semantic coherence to reduce noise for a learning model is known as knowledge engineering. Knowledge engineering has been an important part of the science of AI, and has been referred to in the past as “good, old-fashioned AI” (GOFAI). GOFAI involves using very deterministic models over data which has been well-organized by humans with intuitive senses of the fuzzy borders of semantic meaning. Even modern AI approaches lend much to the knowledge engineering in the 1980s and 1990s, whereas even very data-intensive probabilistic approaches to machine learning require structured data to understand the nature of the problem, and these resources are readily reusable. In the area of the production and machine learning of semantic ontologies, Wordnet is a freely available resource linking semantic data together, along with Framenet, an alternative manner of organization of meaning. Paid resources exist in the private sector through groups which hire large numbers of computer scientists and computational linguists such as Austin’s own Cycorp.
Search engines are often referred to as a type of AI. Google has weak question-answering capabilities and is capable of some semantic clustering. In a world where semantic meaning is extremely relativistic, and no two people share the same world view, the availability of a large number of “paid” directories is valuable for the growth of artificially intelligent applications in the future. Diversity as seen in both semantic corpora and site directories lends a multitude of viewpoints, and allows for such useful approaches in machine learning such as ensembling, where a number of models trained on different data sources pool their knowledge together and come to a mutual conclusion—generally with a much lower rate of error than any single model could do individually.
The Open Directory Project is a free directory with fairly scientific goals—to store any site submitted to it in a machine-readable, semantically rich format with human review for legitimacy and relevance to its submitted category. It even touts itself as The Internet Brain, a somewhat prescient term given its promising future in AI and how it can be used on “the cloud”. This project is maintained by a large network of unpaid volunteers. That’s a great and altruistic dedication of time by individuals who care about categorizing knowledge well, but there are many ways to view meaning and relevance to various semantic categories.
As citizens of the Internet, we should also be promoting alternative views of the taxonomy of the pages therein; there are no hard and fast rules for meaning as there are for, say, animals or elements. Unfortunately, cataloguing meaning through site directories is a time-intensive task. Owners of “paid directories” are actually charging for the immediacy of their volunteer work as knowledge engineers. Yes, there is clear value in search engine rankings for being listed on these sites. However, a much more important value for these directories in the future revolves around their role in crafting an understanding of how these sites apply to the various categories they’re submitted to across the thousands of directories out there. Directories which offer express inclusion for a fee are still a practical application of knowledge engineering, with a promising future for cloud-based artificial intelligence, and they should not be denigrated by either search engine company or small business, as both should be able to see a scientific and personal benefit for participation in as many of them as possible.