Teaching a Computer to Know When a Domain Name Suits a .org Extension

I’m on a quest to build a brilliant domain valuation model.

How humans can beat machines at valuing domains, when the computer has dozens of data points and thousands of historical sales available to it is beyond me, but currently it’s true.

The two main problems with domain valuation is getting good data to base models on and incorporating the “human” element of domain name valuation into the models.

It’s never going to be cost efficient to have humans manually review the 200,000 domains that drop every day, so our current solution is to gather a bunch of data, have humans search for different data points they like and then manually review hundreds of domains each day.

This is what happens on DropDay, FreshDrop, etc..

Except I’m on this quest right – to build an awesome and incomparably better domain valuation model- which I will build into the DropMining product allowing users to search for only the most valuable dropping domains and then apply human evaluation to whittle that list even further, saving users literally hours scanning drop lists every day.

So how do you build human behaviour into mathematical models? Again you have two choices:

  1. You can get some uber, uber smart computer scientists to develop natural language processing algorithms, couple it with machine learning and artificial intelligence and you’d probably still get a pretty poor model.
  2. Or you can use the right data.

One of the most predictive data points in the dozens of domain valuation models I have built on retail sales, wholesale sales and big portfolios is the number of other extensions registered on that domain.

This is a perfect example of using data that measures human behaviour.

Sure there is some inherent value in knowing that when you buy a domain there are a number of potential buyers straight off the bat.

But the primary reason this data point is so predictive is that a .com domain registered in .tv, .co, .net, .org, .info, .me is likely to be a good one because other humans decided it was good enough to buy in those extensions.

Unfortunately there are many other human decisions that haven’t yet been incorporated in domain valuation models; if the domain is a domain hack, how brandable is it, does it suit the extension, is there a preferable but similar version of the same domain, etc.

None have an obvious solution and as a result we have given up hope of ever having an automatic domain valuation model that works.

Only, nobody has thought outside the box.

Eureka

I was lying in bed pondering this when a solution to one of these problems struck me.

You see I had noticed in a study of DomainMarket.com’s entire domain portfolio that when the .org version of a the exact same .com was priced and the .org was charity related there was very little different in their prices, but when the domain wasn’t charity related the .com was priced much higher than the .org.

This seems like an obvious observation; .org domains that suit the extension are worth more than ones that don’t suit the extension.

But this is after all a human decision, does a domain or doesn’t it suit the extension is based on a knowledge of what charity is, what words are associated with it, etc. so representing this in a model without all that complicated advanced computer science stuff seemed unlikely unless I could find a data point that represented this human behaviour.

Eureka – I bet that when you search Google for the keywords that make up a domain name the number of .orgs in the top 100 results is a good predictor of whether that keyword and thus that domain is charity related.

I leapt out of bed and started coding.

I had previously conducted a ground breaking correlation study into 10,000+ Google search results attempting to figure out what causes sites to rank well in Google, from this I had learnt that 10.13% of domains in Google search results are .orgs.

So I knew that I should expect a decent amount of .orgs for every hundred and that variations from this mean would suggest that a domain was either more likely or less likely to be domain related.

I whipped up a quick script and tested my theory, searching for a couple of obvious .org candidates like “donate” and “donate to Africa”.

71 of the top 100 results for “donate” were .orgs and 51 of the top 100 for “donate to Africa” were .orgs overwhelmingly proving my hypothesis that the number of .orgs in the search results is an indicator of whether a domain was suited to the extension or not.

Now I need to test it on real data – where the decision as to whether the domain was charity related or not was less obvious.

Testing

First I went to my data from DomainMarket.com. I had already observed that they purchased .org domains primarily when they were charity related so it was a skewed data pool in terms of running regressions or any other meaningful analysis.

Nonetheless I wanted to see if my micro application of the theory held true at least at a correlationary level.

I ran my script over 440 .org domains in the portfolio and low and behold the average number of .org results in the top 100 results for the DomainMarket .org domains was 20.64, a significant increase from the normal average of 10.13.

A cursory glance at the high and low scores was revealing; topping the list were the likes of PublicMedia.org, InternationalRelief.org, DonateOnline.org and PeaceInstitute.org – all I’m sure you would agree charity related domains.

Whereas low and no-scorers were the likes of; Hairlines.org, Bronzers.org, IsleOfCapri.org and PhoneCards.org.

Applications

Knowing that the DomainMarket data was skewed towards charity related domains and the anecdotal evidence presented above it is clear that the number of .org results in Google is an excellent measure of what is traditionally a human decision of whether or not a domain is charity related or not.

This has major implications in improving the efficiency of domain investors, for example a firm such as DomainMarket.com – intent only on purchasing charity related .orgs unless they are of ridiculously high value, could first screen .orgs through this test and save significant time in whittling down potential acquisition targets using computer power before unleashing the humans on what would now be a much smaller and more attractive list of acquisition targets, saving time and money.

In addition this data can improve our valuation models, replacing the traditional fixed multipliers of .org domains are worth X% of .coms which is a way oversimplified version of how the market really works.

Everyone knows a domain suited to it’s TLD is worth more than a domain that isn’t suited to it’d TLD, this is just a means of implementing that human decision into the valuation model.

Of course this data point could also be used in determining how local a domain is. For example a search for “Vancouver” is likely to return a lot more .ca results than a search for “Dublin” which can then be used to aid the valuation of ccTLDs.

But the one learning I hope comes about of this neat discovery is that the people who think that domain valuation is too “human” to automate, think outside the box and ask themselves what other hidden data points can replace the human decisions we currently make.