UK Domain Categorisation (UKDC)
UK Domain Categorisation (UKDC) is a free service to help all .UK registrars who are also Nominet members to gain useful insights into their registrants’ domains by categorising them into industry sectors, and is subject to the general provisions of the .UK Registrar Agreement.
The appetite for information about domains and websites continues to increase, so we have built a model that aims to classify any business-owned website by industry sector. This service, supported by our Domain Analyser tool, will help enable Nominet members to identify the industry sectors that make up their domains under management (DUMs) as well as those that are underrepresented. There are 24 industry sectors within the UKDC service.
Members who have signed up to the service will be provided with a CSV file of their domains that have been categorised. Additionally, a summary document will provide statistics comparing their data with the whole registry as well as a glossary of terms used in both the documents. The files will be available over SFTP.
Members can sign up for the service at any time. If the UKDC service is of interest to you, please visit Online Services to begin the registration process.
Frequently Asked Questions
- How much does UK Domain Categorisation (UKDC) cost?
UKDC is free to all Nominet members – it is a member benefit.
- Why is Nominet introducing this?
We hope this UKDC service, supported by our Domain Analyser tool, will provide Nominet members with useful insights into their registrants’ domains by categorising them into industry sectors.
- Is it relevant to all members?
UKDC is available to members with active domains on their TAG(s).
- What is the Domain Analyser tool?
Domain Analyser is the tool Nominet uses to collect data, on a regular basis, on whether .UK domain names resolve, where they are hosted, whether they are used for email and whether a website is in place.
- What data will be available?
Two data files will be available for UKDC:
- A CSV file of all your active domains at time of analysis along with the associated TAG and the category or categories to which it belongs. If a domain has fallen into multiple categories, then these are displayed semi-colon delimited. Every domain will also have a timestamp indicating when the website was last visited by the Domain Analyser tool as well as the age of the domain. In more recent CSV files, you will also find flags for SSL certificates and MX records, the created and expiry dates of the current instance of the domain (so you can filter recent expiries when matching with your data), and the first date that domain was registered. A header giving you the column names will be contained in the first line of the CSV file.
- A summary document containing aggregated results of your data by category against the results for the whole registry for comparison. The previous month’s aggregated results will also be shown for a quick trend analysis. This file will contain the model version number as well as accuracy scores for the model, along with the descriptions of all the terms used in both files.
- Which categories can a domain fall into?
In the first instance a domain can fall into one of the following:
- No Content – so can be one of the following:
- no nameservers
- no ip
- no webserver
- webserver informational response
- webserver no content
- webserver bad redirect
- webserver client error
- webserver server error
- webserver abnormal response
- Parked – a parked domain
- Unable to Categorise – these are websites that we are unable to categorise into an industry sector and can be because of one of the following:
- insufficient content
- personal site
- Not Visited –
- the Domain Analyser tool has not yet visited this domain
- the Domain Analyser tool was blocked by the robots.txt policy of the website
5. Categorised - the domain has been categorised into one or more of the industry categories below:
- Aerospace and Defence
- Agriculture, Forestry and Gardening
- Arts, Entertainment & Leisure
- Beauty & Perfume
- Education & Training
- Employment, Recruitment & HR
- Energy & Utility Suppliers
- Financial Services & Insurance
- Food Products & Services
- Furniture & Appliances
- Information Technology & Telecommunications
- Legal, Public Order & Security
- Mining & Drilling
- Political, Social & Religious
- Project Management, Marketing & Administration
- Publishing, Printing & Photography
- Real Estate
- Scientific & Engineering
- Textiles, Nonwovens & Fashion
- Tourism, Holiday & Accommodation
- Transportation, Logistics & Storage
As of October 2017, three of these categories have been further split into subcategories. The subcategories are given below:
- Builders and Architects
- Cleaning, Caretaking and Landscaping Activities
- Decorators and Carpenters
- Electrical Installation
- Glazing and Conservatories
- Plumbing, Heating and Air-Conditioning Services
- Roofing Activities
- Security Systems
- Other Construction Activities
Food Products & Services Subcategories
- Cafes and Restaurants
- Catering Services
- Food Production and Retail
- Pubs and Bars
- Other Food Products and Services Activities
Tourism incl. Holiday Accommodation Subcategories:
- Self-Catered Accommodation
- Travel Agencies
- Tourist Sites and Tourist Activities
- Bed and Breakfasts
- Camp Sites and Caravan Parks
- Other Tourism and Holiday Accommodation Activities
- How did you choose these categories?
We chose these categories by starting with the UK Standard Industry Classification (SIC) which is very similar to the European NACE code. However, these are based on tax codes and don’t necessarily correspond to useful categories for the domain industry.
We therefore produced a list of 24 categories that can be mapped back to the high-level UK Standard SIC groups, and correspond to subject areas of businesses (e.g. food, automotive, health).
Together with colleagues from European registries and registrars, we have been developing these into a universal set of categories for domain classification called the Domain Industry Taxonomy (DIT) categories. More information will be made available at https://stats.centr.org/rrdg#dit.
- How are the domains categorised?
Nominet undertook a project to manually classify a statistically significant number of domains by visiting the websites associated with them. These accurately categorised domains, along with their web pages, were then used to create a data set on which to train a model. The model can then run over all domains and categorise those which have a website visited by the Domain Analyser tool.
- How often will the files be updated?
The files will be updated once a month. The process to create the files will run on or around the 21st of each month and be available to download later that day. The filenames will be in the format:
This will allow you to be able to differentiate monthly files. We will keep a six month rolling backlog of your data files for you to be able to download.
- How up to date is the data?
The analysis is run monthly and each file is timestamped with that date. Each row in the CSV is timestamped with when the Domain Analyser tool last visited that domain.
- How do I get hold of the data?
Members should sign up to online services to access the UKDC data. If you have already signed up for the Zone File Access service, then after signing up to the UKDC service, you will be able to use the same SFTP credentials.
If you do not have an account for the Zone File Access service, an account will be created for you to access the SFTP site which is hosted by a third party.
Once you have signed in the two files can be downloaded from:
- How accurate is the data?
The Summary Information file includes several measures of estimated accuracy for each categorisation, including ‘recall’, ‘precision,’ and ‘F-Score’.
- Recall is the proportion of domains belonging to that category that were correctly identified as belonging to that category by the model (i.e. how many relevant domains are identified);
- Precision is the proportion of domains identified by the model as belonging to that category which actually should belong in that category (i.e. how many identified domains are relevant);
- F-Score is an ‘overall accuracy’ score which takes into account both recall and precision.
For subcategories the accuracy measures are conditional on first being categorised into the appropriate top level category; a domain can only be classified into a Construction subcategory if it is first classified as Construction.
- When will I start to receive my data?
The first data set will be available from 21st March 2017 and after that date account activation for new registrations will usually be within three working days. Each member will recieve confirmation via email when their account is active.
- What do you need me to do, and how do I share my feedback?
We would like registrars to sign up for the free service if they think it will be useful to them, so we can start the process of activating their account.
We would welcome your feedback for us to consider any additional requirements that may be useful to you and other registrars as this will help inform how we might improve the service in the future. Please do share your thoughts via email or if you have any questions you can speak to one of our team on +44.1865332233.