NFTs Quantitatively

Intro

I have been buying NFTs for over a year now, purely for passion. My favourite NFT projects include The Currency by Damien Hirst, Skulptuur by Piter Pasma and quite a few generative art projects from Art Blocks and GM.studio.

Skulptuur by Piter Pasma

The Currency by Damien Hirst

Running Moon by Lucia He

Since I started collecting I have been thinking, can we predict which which NFTs are going to go up?

So this article will try to surface some trends from historical data. All the data for this analysis comes from Nansen Query.

Data

NFTs don’t have time series data by default (as every sale is a different token), so we use sales floor instead. The sales floor tracks the lowest trading price of any token wihin the NFT project. We are interested in predicting the future trend of this time series. See an example below:

Specifically, lets predict the weekly return of sales floor for each NFT using factors that are publicly available on the blockchain.

previous returns
number of owners
average balance over all the wallets
number of wallets with just a single token
.. and so on..

NFT Market Index

The chart below shows the weekly return averaged across 1,140 NFTs (equal weighted). With the range between -25% to +50%, even the whole market seems very volatile and perhaps impossible to predict.

If we predict just the future floor price, the analysis will be confounded by market volatility.

We need to remove the impact of the market from NFT returns. As we are trying to find NFTs which are comparitively better than others, instead of predicting the trend of the floor price I will be predicting the excess return. This is the return over the market.

Mathematically speaking, the $WeeklyExcessLogReturn$ for week $j$ of NFT $i$ can be described as follows:

$$ WeeklyExcessLogReturn_{ij} = log(WeeklyReturn_i) - \sum_{i=1}^{n} \frac{log(WeeklyReturn_{ij})}{n} $$

Note: Since returns are log normally distributed, we take logs to have nice normalish looking data.

Like magic, we find that our excess log returns are normally distributed with a mean of 0 and SD of 0.23.

Scams, Rug Pulls & Liquidty

To remove the possibility of including rug pulls, scams and low quality projects in our dataset I have a selection criteria. The NFTs and their weekly returns only start appearing in my dataset when:

Number of unique owners > 100
Floor price > 0.2
Has been 30 days since first secondary market sale
An average of 1 sale a day

To remove any possibility of selection bias, once a NFT is included in the dataset, it is never removed.

This gives me a total of 1140 projects to date. See below the number of NFTs in the basket by date.

Model

A very simple statistical model looks like this:

$$ WeeklyExcessLogReturn_i = \sum_{j=1}^{} \beta_jx_i + \epsilon $$

Where $\beta_j$ is a factor that affects returns, and epislon is error that is assumed to be normally distributed with a mean of 0. $x_i$ is the value of the factor in week $i$.

The factors that I will use to predict the excess log return in the next 7 days are:

Weekly change in number of owners
Weekly change in average balance
Weekly change in number of wallets holding a single token from the collection
The excess log return in the last 7 days
The number of days since mint at the end of the week
The number of owners at the end of the week

Exploration

We have a total of 33,076 weekly returns from 1140 NFT projects between the date of April 2021 and Dec 2022.

On average there are 29 weekly return data points for a project. The distribution looks like this:

Fair Analysis

As we have more data points from some projects and less from others, an analysis based on the raw data would be biased. Hence, we sample (with replacement) 15 data points per project. This will give us a fair dataset for analysis.

Price autocorrelation

There is a slight positive correlation between the excess returns this week compared to previous.

The trend is clearer without the dots

This shows that momentum trading on NFTs could be a profitable strategy.

Other factors

The charts below plot a line of best fit between the (future) weekly excess returns and the metric. The grey bands are 90% confidence interval of the mean. If the grey bands cross the black line at $y=0$, then the correlation is insignificant.

It is interesting to see that metrics to do with high concentration, e.g. increasing averge balance and decreaing number of owners, seem to be positively correated with future returns. I thought it would be the opposite! Perhaps, by the time the collection has higher number of owners / lower average balance, it is already too late? The ‘whales’ that were accumulating are no longer pushing the price up?

I am also surprised to see that the older the project (Days since mint) the more correlated it is with higher weekly excess returns! This is probably a case of selection bias, but it seems like it does benefit to look at ‘mature’ projects only.

All metrics except number of owners and days since mint seem to be insignificant. The change in average balance seems like a strong correlation but the grey area just touches the black like.

Significance Testing

The chart above is a single variable model, lets see what happens when we combine other factors in a linear model. This will also test for significance more robustly.

As we randomly sample 15 data points per project I ran the linear model 100 times, each time sampling a random 15 data points per project to give a robust estimate (see bootstraping).

The formula of the linear model is

1
2
3
4
5
6
7
8
9


model = lm(
    formula = fp_change_fwd_adj ~ log(num_owners_chg_7d) + 
                                log(avg_balance_chg_7d) + 
                                log(single_owners_chg_7d) + 
                                fp_change_prev_adj + 
                                log(project_age) + 
                                log(num_owners) - 1, 
    data = data
)

See the results below, the table is ordered by most negative coefficient, and the significant metrics are in bold.

metric	Avg. Estimate	Estimate SD	p.value
log(num_owners_chg_7d)	-0.1416	0.0852	0.141
log(num_owners)	-0.0128	0.0014	0.000
log(project_age)	0.0188	0.0019	0.000
log(avg_balance_chg_7d)	0.0647	0.0689	0.267
excess_returns_prev_7d	0.0906	0.0100	0.000
log(single_owners_chg_7d)	0.1094	0.0772	0.193

Change in the number of owners is the strongest signal for weekly returns. As we saw before it is negatively correlated with future price returns. However, it seems quite noisy with a high SD and low p.value.

The most significant metrics negatively correlated with weekly returns are number of owners and project age (days since mint). As expected historical excess returns is correlated positively with future returns and the relationship is significant.

Surprisingly, the average balance change is not significant in this model, perhaps because its correlated with other variables.

TL;DR

Based on historic data, if one is going to ape into a NFT, look out for

🟩 - significant 🟨 - less significant

metric	direction
😀 # of owners	🔻 🟩 low number of owners
⌛ Project age	🔻 🟩 mature project > 90 days since mint
✅ Excess return last week	⬆ 🟩 NFT already going up relative to market
🔀 Change in owners	🔻 🟨 decreasing
💼 Avg balance of owners	⬆ 🟨 increasing
1️⃣ single NFTs owners	⬆ 🟨 increasing

Next Steps

There are a few other metrics we can add to our model:

average ‘wealth’ of owner (and changes)
more historical data, 1 month change, 3 month change etc
% of collection listed (and its changes)

We might also benefit from a more sophisticated model, one that can take into account interactions and non linearties.

A more robust method to remove washtrades and scam collections.

How about a full blown ML solution, where we don’t care about interpretation but just plug in the ‘data’ and let the model predict the best candidates for high weekly excess returns?

NFTs Quantitatively

See Also: