# An introduction to the Dynamic Optimal Shrinkage Portfolio package

This package is a collection of the methods developed in (Bodnar, Parolya, and Thorsén 2021). It constructs sequences of estimated Global Mimimum Variance (GMV) portfolios. The concept of a portfolio is most often used in finance as a way to invest in financial assets (e.g. stocks), which are the applications we will have in mind. There are other applications of these types of estimators, such as signal processing. The simplest example of a portfolio is the equally weighted (EW) portfolio which says that you should invest equal amount in every assets. If you have $$p$$ assets, then each asset gets $$1/p$$ of your money.

# How to use the package

To install the package you need to run devtools::install_github("Statistics-In-Portfolio-Theory/DOSPortfolio"), which demands that you have installed devtools. To load the package, run

library(DOSPortfolio)

The main interface to the package is DOSPortfolio::DOSPortfolio which acts as a wrapper for DOSPortfolio::wGMVOverlapping and DOSPortfolio::wGMVNonOverlapping. The big perk of using the interface is that you get input validation and more informative errors. You could use the functions directly, which would most likely not throw an error (with an exception of specific degenerate cases), though the results might not make sense. Lets simulate some data, in our simple example we assume the (log) returns are given by a t-distribution with 5 degrees of freedom.

n <- 25*2
p <- 15
data <- 5/3 * matrix(rt(n*p, df=5), ncol=p, nrow=n)

Notice that it is a matrix on long format. Each column is assumed to be the time series of an asset. We assume that the returns are ordered in time. The observation on the first row was observed before the observations on the second and so forth. To construct the DOS portfolio we need to specify the number of reallocation points and when these happen. Since data is a matrix on long format, we work with the row index. The DOS portfolio is then constructed as

reallocation_points <- c(25, 42)
# use the first subsample to estimate the relative loss
(portfolios <- DOSPortfolio(data, reallocation_points))
#> $weights #> [,1] [,2] [,3] [,4] [,5] [,6] #> [1,] 0.06928281 0.06307516 0.07207183 0.06329702 0.07025557 0.06456685 #> [2,] 0.07012946 0.06191285 0.07382107 0.06220651 0.07141704 0.06388729 #> [,7] [,8] [,9] [,10] [,11] [,12] #> [1,] 0.07094686 0.06703658 0.06952505 0.06601155 0.06454546 0.05814309 #> [2,] 0.07233203 0.06715630 0.07045009 0.06579953 0.06385898 0.05538465 #> [,13] [,14] [,15] #> [1,] 0.06867475 0.06727933 0.06528809 #> [2,] 0.06932462 0.06747761 0.06484195 #> #>$shrinkage_type
#>  "non-overlapping"
#>
#> attr(,"class")
#>  "DOSPortfolio"

The first parameter we configure is the number of reallocation points and when they happen. These are specified in the reallocation_points vector and specify when you recompute the weights. In this case we re-weight our portfolio on the 25th observation and 42th observation. Notice that there we do not include the last 10 observations. There are only 2 reallocation points and these do not include the last observed value. If you wanted to include a third reallocation point, just add the number of rows (or 50 in this case) to the reallocation_points vector. However, as we will see later on, this will not work in our example though there is a simple workaround.

Each row of portfolios$weights is a shrunk GMV portfolio. For each reallocation point the portfolio is computed using a convex combination using the previously estimated portfolio and on the first transition, the target portfolio. This is the “transition” which is made optimally and is specifically tailored to work in higher dimensions, when the number of assets is very large and possible close to the number of observations. The string seen in portfolios$shrinkage_type is what type of shrinkage estimator we used to make the transitions. There are two options, "non-overlapping" and "overlapping". These can handle quite different scenarios which we can illustrate by extending the reallocation_points vector to

reallocation_points <- c(25, 42, 50)
# This will not work
try(portfolios <- DOSPortfolio(data, reallocation_points))
#> Error : Non-overlapping estimator can not handle concentration ratios above one.
#>             Consider excluding one (or more) break point(s) or provide more data.

However, this will work

(portfolios <- DOSPortfolio(data, reallocation_points, shrinkage_type = "overlapping"))
#> $weights #> [,1] [,2] [,3] [,4] [,5] [,6] #> [1,] 0.06928281 0.06307516 0.07207183 0.06329702 0.07025557 0.06456685 #> [2,] 0.07465137 0.05570507 0.08316368 0.05638221 0.07762032 0.06025783 #> [3,] 0.07651752 0.05314318 0.08701928 0.05397857 0.08018036 0.05875999 #> [,7] [,8] [,9] [,10] [,11] [,12] #> [1,] 0.07094686 0.06703658 0.06952505 0.06601155 0.06454546 0.05814309 #> [2,] 0.07973017 0.06779568 0.07539070 0.06466719 0.06019256 0.04065199 #> [3,] 0.08278331 0.06805955 0.07742963 0.06419988 0.05867947 0.03457197 #> [,13] [,14] [,15] #> [1,] 0.06867475 0.06727933 0.06528809 #> [2,] 0.07279552 0.06853658 0.06245913 #> [3,] 0.07422793 0.06897360 0.06147576 #> #>$shrinkage_type
#>  "overlapping"
#>
#> attr(,"class")
#>  "DOSPortfolio"

The issue concerns what is called the “concentration ratio” which we denote $$c_i=p/n_i$$, where $$n_i$$ is the size of each subsample. In our case these are 25, 17, 8. When the methods are derived for the “non-overlapping” shrinkage estimator it is assumed that $$c_i \in (0,1)$$. This is due to the assumption that the sample covariance matrix needs to be non-singular. For the “non-overlapping” estimator we do not need to make such a rigid assumption, we only need that the first value $$c_1 \in (0,1)$$ and the rest can be $$p>n_j$$ for $$j=2,...,T$$. One methods moves in windows of size $$n_i$$ and the other extends the data to include all observations. We use $$N_I=\sum_i^I n_i$$ data to estimate each portfolio weight.

The last two parameters that are available to configure is the target portfolio, which we will denote $$\mathbf{b}$$, and an intial seed/estimate for the relative loss parameter. The target portfolio can be any portfolio whose values sums to one. The default value is the EW portfolio previously mentioned but it can be any other portfolio of your choice.

reallocation_points <- c(25, 42)
new_target <- runif(p, -1, 1)
new_target<- new_target/sum(new_target)
(portfolios <- DOSPortfolio(data, reallocation_points, target_portfolio = new_target))
#> $weights #> [,1] [,2] [,3] [,4] [,5] [,6] [,7] #> [1,] 0.08190014 0.01485708 0.1260101 0.01418530 0.09871228 0.04108800 0.1151250 #> [2,] 0.08435591 0.02628987 0.1177301 0.02676723 0.09673967 0.04551436 0.1078659 #> [,8] [,9] [,10] [,11] [,12] [,13] #> [1,] 0.07385225 0.09630610 0.07742194 0.03881509 -0.011751581 0.08697632 #> [2,] 0.07178860 0.09276249 0.06982581 0.04425078 -0.005959531 0.08473256 #> [,14] [,15] #> [1,] 0.07816182 0.06834013 #> [2,] 0.07493839 0.06239796 #> #>$shrinkage_type
#>  "non-overlapping"
#>
#> attr(,"class")
#>  "DOSPortfolio"

The initial estimate for the relative loss parameter demands some more detail. The relative_loss refers to how the following quantity $r_0 = \mathbf{1}'\boldsymbol{\Sigma}^{-1}\mathbf{1} \mathbf{b}'\boldsymbol{\Sigma}\mathbf{b} -1$ which describes how the variance of our target portfolio $$\mathbf{b}'\boldsymbol{\Sigma}\mathbf{b}$$ relates to the variance of the GMV portfolio $$1/\mathbf{1}'\boldsymbol{\Sigma}^{-1}\mathbf{1}$$. A value of $$0$$ would indicate that we know the variance of the portfolio and it is the target portfolio. The argument is equal to NULL per default which implies that we use a simple estimate of the quantity. We use the sample covariance matrix $$\mathbf{S}$$ from the first subsample to estimate $$\boldsymbol{\Sigma}$$ and its consistent inverse estimate $$(1-c)\mathbf{S}^{-1}$$. The factor $$1-c$$ comes from the fact that the inverse sample covariance matrix is biased when $$p$$ is allowed to grow together with $$n$$, that is when $$p\rightarrow \infty$$ s.t. $$n>p$$. We can use any other estimate, though we need to make sure that the estimate is consistent, e.g. the estimate $$\hat{r}_0$$ converges to $$r_0$$ in probability.

## A note on the assumptions made in DOSPortfolio

The statistical model we make use of to derive the methods is $\mathbf{Y}_{n_i} = \mathbf{1}_{n_i}\boldsymbol{\mu}' + \mathbf{X}_{n_i} \boldsymbol{\Sigma}^{1/2}$ where $$i=1,2,...,T$$ is the number of reallocation points in the model. A reallocation point is just when we will reweight the portfolio, that is transition from one existing portfolio to another. Note that we can only observe $$\mathbf{Y}_i$$ but when constructing GMV portfolios we are interested in estimating $$\boldsymbol{\mu}$$ and $$\boldsymbol{\Sigma}$$. We make the following assumptions

1. The elements of $$\mathbf{X}_i$$ has finite fourth moment.
2. $$t_i$$ and $$T$$ are known (nonrandom).
3. For a target portfolio $$\mathbf{b}$$ the quantity $$\mathbf{b}' \boldsymbol{\Sigma}\mathbf{b}$$ is bounded when $$p \rightarrow \infty$$.

The first assumption is standard to make in finance but enforce the end-user to verify that its true.
The second assumption implies that we do not tell you when you should reweight a portfolio, we simply say that they are given. Although this assumption has little practical relevance, it is an assumption that is more philosophical and hard to test. The third assumption is also somewhat philosophical. The target portfolio is chosen by the end-user though we need to have that the quantity $$\mathbf{b}' \boldsymbol{\Sigma}\mathbf{b}$$ to be finite for our methods to work. This makes some assumptions on the relation between the true covariance matrix $$\boldsymbol{\Sigma}$$ and how it relates to $$\mathbf{b}$$. An important assumption that is hard to verify!

The sequences of portfolios are constructed to make as smooth transitions as possible from one portfolio to another. The transitions are made to be “optimal” when you have many assets $$p$$ in comparison to observations $$n$$, which is usually characterized by the concentration ratio $$c=p/n \in (0,1)$$. When $$c$$ is close to 1 we have very little data in comparison to how many parameters we need to estimate. Note that we do not cover $$c>1$$ for non-overlapping estimators, though our aim is to do so in the future.