In the last post we saw that advertised rental prices in Zone 1 London were down upto 25% compared to 2019. However, that analysis did not adjust for the properties. It could be that most of the properties advertised this year were lower value compared to 2019.
In this post, we use a model based approach to adjust for change in property mix to estimate the average change in prices.
Data
Our dataset contains properties in Zone 1 that have been listed more than once over two different years. We have 35,550 listings from 7,195 properties. Most of the listings are in 2020 (see below)
Number of listings per year
2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002
6683 3407 2794 2522 2448 2213 2199 1849 3475 3359 3165 891 199 78 89 155 13 10 1
Within 2020, 910/6683 (14%) of listings were from before the UK wide lockdown in March.
Prices asked and postcodes
price postcode_area postcode_district postcode_region
Min. : 511 W2 : 4408 W2 : 4408 SW :11325
1st Qu.:1733 W1 : 4321 SE1 : 2767 W :10902
Median :2383 SW1 : 3402 SW7 : 2640 SE : 3045
Mean :2860 SE1 : 2767 NW1 : 2437 NW : 2701
3rd Qu.:3358 SW7 : 2640 SW3 : 2409 E : 2691
Max. :9997 NW1 : 2437 E1 : 1918 EC : 1975
(Other):15575 (Other):18971 (Other): 2911
Model
I will build a model to predict the price based on property_id, postcode area, postcode district, postcode region, sale year, and sale month. I add an interaction between sale year and postcode to extract the change between 2019 and 2020. Since the features are correlated, a regular linear model would suffer, hence we use ridge regression.
Model formula
log(price_num) ~ property_key + postcode_district + postcode_area +
sale_year + sale_month + postcode_region + postcode_region:sale_year +
postcode_district:sale_year + postcode_area:sale_year - 1
The model can be fitted on the dataset using the glmnet
package in R. I use 4-fold cross validation to pick the amount of regularisation required to prevent overfitting. The cross fold RMSE was 0.30.
Map view
We can feed dummy data into our model for 2019 and 2020 and predict the prices of all properties in each postcode for the two years. The difference between 2019 and 2020 will give us an estimate of the COVID and WFH effect.
The visualisation of the change in prices per ward (postcode boundaries) in Zone 1 of London can be seen below. The darker the colour, the lower the prices in 2020 compared to last year.
The overall decrease in prices of 0 - 12% according to the model seem less harsh than our previous analysis of negative 10-25%. This is expected since we are adjusting for the change in the mix of property in 2020 with the model.
Otherwise, we see that prices in center have decreased much more than outer areas (as expected). Separately, we can see that prices in West London (WC) postcodes have decreased more than EC postcodes. South London also seems to have been less affected. Shoreditch/Hackney areas have not been affected at all!
Further work
Next time, I plan to compute the change in prices for more zones and also add uncertainty bounds using the brms
package in R.