This is a post forming the Central London Property analysis I am doing using data from various listing and government websites.
In today’s post, I will be looking at the number of properties listed for sale and sold over the last few years in Central London. As before, I define Central London as postcodes EC1, EC2, EC3, EC4 and WC1, WC2.
I would expect that COVID-19 and work from home culture must have lead to many more listings this year. It would be interesting to see if these were followed by sales. I will do an analysis of rentals and lets separately.
The data for listing and sales comes from an online property listing service. Listings are under a property id which is great for confirming repeat sales. However, sometimes there are extra listings within a listings, for example - if there has been a price reduction or a re listing. So, I spent 80% of the time cleaning and making sense of this data 😭
The full code for this blog post can be found on my GitHub.
Intro
I have 243k listings from 40k properties in Central London.
Lets look at all the different types of listings we have, denoted by listing status.
A listing status are ’tags’ on the listing page. They tell us if a property has been sold/let. Alternatively we also get status updates from the property page. For e.g Listedforsale
means that a property was historically listed and Sold
means a property was sold according to land registry.
Status | Freq | Meaning |
---|---|---|
sold | 148581 | landregistry.gov.uk data until 1st June 2020 |
missing | 81077 | no status found, usually some old listings and also brand new listings |
listedforsale | 4055 | historic listings on website |
sold stc | 3228 | tag on the listing on the website, not from official source. |
let | 1651 | a property has been rented out succesfully |
let agreed | 1160 | same as above |
under offer | 1113 | an offer has been made on the property but still for sale |
new | 1081 | a ’new’ listing, not really important for us |
listing expired | 922 | listing was removed from the website |
just added | 218 | same as new |
sold subject to contract | 188 | same as sold stc |
new build | 188 | new build property |
investment | 116 | tag on the listing |
new home | 7 | tag on the listing |
A listing can be relisted (!), so we have a info_type
attribute for each listing. This is parsed from a listing page when we have the history of the listing itself.
Type | Freq | Meaning |
---|---|---|
previous_sale_prices | 121823 | previous prices of the property from other listings |
main_listing | 109989 | actual listing |
first listed | 5739 | first listing of the listing |
price reduction | 3925 | price reductions on the listing |
last sold | 1708 | same as previous_sale_prices |
price increase | 641 | price increases |
Surprised to see price increases for a listing! Although they are 4x less common than a price reduction.
Data Cleaning: A property has multiple listings in our table which mean the same thing, leading to duplicates. We could also have multiple entries in our table for the same “listing”. Eg price changes, status changes. To analyse sales, we want to have a unique row per “real” status. We are only really interested in finding out how many properties are sold or rented. However, given the different number of statuses it is not trivial to get “unique” listings. Below is my attempt in code.
Cleaning data
To clean our data, I create a clean_status
column which ignores some statues and groups other statuses. I also remove listings with no prices and dates. The code should explain what I am doing.
R seems to be becoming less favourable than python with the advent of ML. However dplyr is one of my favourite R packages for data science. The syntax feels very natural and it has powerful features for data wrangling. Pandas/Python has a similar API but I find it a bit clunky. Additionally, the
%>%
(pipe) operator in R sends the output from left to right, which makes chaining functions very readable. I would highly recommend giving it a try!
|
|
Summary of our data (188k rows and 15k properties)
saledate_date status_clean price_num info_type
Min. :1995-01-01 Sold :153316 Min. : 173 price_reduction : 3925
1st Qu.:2002-08-01 For Sale: 15466 1st Qu.: 184000 price_increase : 641
Median :2010-04-08 To Rent : 10740 Median : 340000 first_listed : 5739
Mean :2009-03-26 Rent NA : 5374 Mean : 455937 last_sold : 1708
3rd Qu.:2015-06-01 Rented : 2683 3rd Qu.: 585000 main_listing : 54870
Max. :2020-09-05 (Other) : 906 Max. :73106139 previous_sale_prices:121617
NA's : 15
As you can see I managed to find status updates for all but 15 listings. I also have prices for them. I find that our listing data ranges from 2000s - today.
Property histories
The data I have gathered is very rich on the property level. See below all the transactions I know of a certain property.
|
|
listing_id | property_id | info_type | status_clean | price_num | saledate_date |
---|---|---|---|---|---|
NA | 7971826 | previous_sale_prices | Sold | 150000 | 1996-09-01 |
NA | 7971826 | previous_sale_prices | Sold | 200000 | 1997-02-01 |
NA | 7971826 | previous_sale_prices | Sold | 247000 | 1999-11-01 |
NA | 7971826 | previous_sale_prices | Sold | 495000 | 2002-02-01 |
28210585 | 7971826 | main_listing | For Sale | 899950 | 2013-03-01 |
NA | 7971826 | previous_sale_prices | Sold | 899950 | 2013-08-01 |
49364002 | 7971826 | main_listing | For Sale | 1300000 | 2018-09-01 |
53235585 | 7971826 | main_listing | Rent NA | 2773 | 2019-11-04 |
54907582 | 7971826 | main_listing | Rent NA | 3467 | 2020-05-13 |
55730087 | 7971826 | first_listed | To Rent | 4160 | 2020-08-03 |
55730087 | 7971826 | main_listing | Rent Expired | 2947 | 2020-09-02 |
You can see here that the property has been sold 4 times between 1996 and 2002.
- It came on for sale in March 2013 and was then listed as sold in August 2013 for £900k.
- It was then listed for sale in September 2018 for £1.3m (a 6.9% increase per annum!). Since there is no sold listing after the date, we can assume that the property did not sell.
- Instead the owners decided to put it out for rent for £2,773 per month which is just a 3.7% yield on their initial purchase price of £900k (and a 2.5% yield on the)
- Finally, it came on for rent at £4.2k and went down to £2.9k, with the listing no longer available.
The above is a short history of 11 transactions because we have chosen a unique status per year. Actually, there are 31 transactions in total!
The above example is an outlier, the mean number of transactions in our dataset is 3 and the median is 2. We have 3.7k/15k properties with more than 3 transactions.
I am looking forward to analysing this data more in future posts! 🤓
Properties sold per year
To calculate the number of properties for sale, I first deduplicate the data by keeping a single ‘For Sale’ or ‘Sold’ record per property per year.
I am experimenting with
svg
plots since ggplot2 in R produces them natively. They are 85% smaller in size compared topng
and scale perfectly with window size, retaining the detail as you zoom in.
The peak in properties sold was 2013, with 824 properties sold. 2019 saw the lowest number of properties sold in central london since 2005 - a 14 year stretch! The number of properties sold has collapsed by almost 50% since 2005 and 75% since the peak.
The number of properties listed for sale has remained steady in the last 10 years. This implies the collapse in properties sold seems to be led by weaker demand than supply. This year has seen the 3rd highest # of properties offered for sale which is not surprising given the stamp duty discount and COVID causing people to leave for the countryside.
Properties rented per year
Analysing rentals is a bit more difficult. Firstly, there is no central registry which holds a record of a successful let (unlike sales). Secondly,estate agents remove rental listings more often without updating the status. There could be several reasons, including that the property has been successfully rented.
Because of the above reasons, we do not have good historical data on lets. The chart below has two categories.
Rented
- when a successful let is confirmed on the listing.Rent NA
- when the listing has been taken off but no indication has been made if let or not. It is highly probable here that these are as good as rented.
This leads to the chart below:
The number of successful lets in 2020 is surprising. Total confirmed lettings Rented
are above the 2019 level, with 4 months still to go. This does not fit the COVID narrative - everyone fleeing for the countryside.
An important caveat here is that we do not know of listings that were removed prior to 1 August 2020. Hence, the total lettings before 2020 could be understated. Based on listings in August, this effect could understate the lettings in the previous years by 50 - 75%. I will be monitoring this further going forward.