|

A Coding Implementation to Portfolio Optimization with skfolio for Building Testing, Tuning, and Comparing Modern Investment Strategies

In this tutorial, we discover skfolio, a scikit-learn suitable portfolio optimization library that helps us construct, examine, and consider completely different funding methods in a structured Python workflow. We begin by loading S&P 500 worth knowledge, changing it into returns, and making a time-based train-test cut up appropriate for monetary evaluation. From there, we construct easy baseline portfolios, take a look at mean-variance optimization, examine various danger measures, apply risk-parity strategies, and use hierarchical clustering strategies corresponding to HRP and Nested Clusters Optimization. We additionally transfer into extra superior portfolio development concepts, together with sturdy covariance estimators, Black-Litterman views, issue fashions, pre-selection pipelines, walk-forward validation, and hyperparameter tuning with GridSearchCV.

import subprocess, sys
def _pip_install(pkg):
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])


attempt:
   import skfolio  # noqa: F401
besides ImportError:
   _pip_install("skfolio")
   import skfolio  # noqa: F401


print(f"skfolio model: {skfolio.__version__}")


import warnings
warnings.filterwarnings("ignore")


import numpy as np
import pandas as pd
import plotly.io as pio


attempt:
   pio.renderers.default = "colab"
besides Exception:
   pio.renderers.default = "pocket book"


from sklearn import set_config
from sklearn.model_selection import (
   GridSearchCV, KFold, train_test_split,
)
from sklearn.pipeline import Pipeline


from skfolio import RatioMeasure, RiskMeasure, Population
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.model_selection import WalkForward, cross_val_predict


from skfolio.optimization import (
   EqualWeighted, InverseVolatility, Random,
   MeanRisk, ObjectiveFunction,
   RiskBudgeting,
   HierarchicalRiskParity,
   NestedClustersOptimization,
)
from skfolio.moments import (
   EmpiricalCovariance, LedoitWolf, DenoiseCovariance, GerberCovariance,
   EmpiricalMu, EWMu, ShrunkMu,
)
from skfolio.prior import EmpiricalPrior, BlackLitterman, FactorModel
from skfolio.pre_selection import SelectKExtremes


set_config(transform_output="pandas")


costs = load_sp500_dataset()
print("Prices form:", costs.form)
print(costs.tail(3))


X = prices_to_returns(costs)
print("nReturns form:", X.form)


X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)
print(f"Train: {X_train.index.min().date()} → {X_train.index.max().date()}  ({len(X_train)} days)")
print(f"Test : {X_test.index.min().date()} → {X_test.index.max().date()}  ({len(X_test)} days)")

We set up and import all of the required libraries, together with skfolio, scikit-learn, pandas, NumPy, and Plotly. We load the S&P 500 dataset, convert asset costs into returns, and put together the information for portfolio optimization. We cut up the returns into coaching and take a look at units in chronological order to keep away from look-ahead bias.

benchmarks = {
   "1/N (EqualWeighted)": EqualWeighted(),
   "Inverse-Volatility":  InverseVolatility(),
   "Random (Dirichlet)":  Random(),
}


baseline_population = Population([])
for identify, mdl in benchmarks.gadgets():
   mdl.match(X_train)
   ptf = mdl.predict(X_test)
   ptf.identify = identify
   baseline_population.append(ptf)
   print(f"{identify:25s}  Sharpe={ptf.annualized_sharpe_ratio:.3f}  "
         f"AnnRet={ptf.annualized_mean:.3%}  AnnVol={ptf.annualized_standard_deviation:.3%}")


min_var = MeanRisk(risk_measure=RiskMeasure.VARIANCE)
min_var.match(X_train)
print("nMin-Variance weights (prime 5):")
print(pd.Series(min_var.weights_, index=X_train.columns).sort_values(ascending=False).head())


ptf_min_var = min_var.predict(X_test)
ptf_min_var.identify = "Min Variance"


max_sharpe = MeanRisk(
   objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
   risk_measure=RiskMeasure.VARIANCE,
)
max_sharpe.match(X_train)
ptf_max_sharpe = max_sharpe.predict(X_test)
ptf_max_sharpe.identify = "Max Sharpe"


ef = MeanRisk(
   risk_measure=RiskMeasure.VARIANCE,
   efficient_frontier_size=20,
   portfolio_params=dict(identify="EF"),
)
ef.match(X_train)
ef_population_test = ef.predict(X_test)
print(f"nEfficient frontier produced {len(ef_population_test)} portfolios.")


fig = ef_population_test.plot_measures(
   x=RiskMeasure.ANNUALIZED_VARIANCE,
   y=RatioMeasure.ANNUALIZED_SHARPE_RATIO,
)
fig.present()

We create easy benchmark portfolios utilizing equal weighting, inverse volatility, and random allocation. We then construct mean-variance portfolios, together with minimum-variance and most Sharpe-ratio methods. We additionally generate an environment friendly frontier and visualize the trade-off between portfolio danger and efficiency.

risk_measures = {
   "Min CVaR-95":         RiskMeasure.CVAR,
   "Min Semi-Variance":   RiskMeasure.SEMI_VARIANCE,
   "Min CDaR":            RiskMeasure.CDAR,
   "Min Max Drawdown":    RiskMeasure.MAX_DRAWDOWN,
}


risk_pop = Population([ptf_min_var, ptf_max_sharpe])
for identify, rm in risk_measures.gadgets():
   m = MeanRisk(risk_measure=rm)
   m.match(X_train)
   p = m.predict(X_test)
   p.identify = identify
   risk_pop.append(p)


print("nRisk-measure comparability on take a look at set:")
_summary = risk_pop.abstract()
_wanted = ["Annualized Sharpe Ratio", "Annualized Sortino Ratio",
          "CVaR at 95%", "Maximum Drawdown", "Max Drawdown"]
_have = [r for r in _wanted if r in _summary.index]
print(_summary.loc[_have].T)


rb_var  = RiskBudgeting(risk_measure=RiskMeasure.VARIANCE)
rb_cvar = RiskBudgeting(risk_measure=RiskMeasure.CVAR)
rb_var.match(X_train);   rb_cvar.match(X_train)
ptf_rb_var  = rb_var.predict(X_test);   ptf_rb_var.identify  = "Risk Parity (Var)"
ptf_rb_cvar = rb_cvar.predict(X_test);  ptf_rb_cvar.identify = "Risk Parity (CVaR)"


hrp = HierarchicalRiskParity(risk_measure=RiskMeasure.VARIANCE)
hrp.match(X_train)
ptf_hrp = hrp.predict(X_test); ptf_hrp.identify = "HRP"


nco = NestedClustersOptimization(
   inner_estimator=MeanRisk(risk_measure=RiskMeasure.CVAR),
   outer_estimator=RiskBudgeting(risk_measure=RiskMeasure.VARIANCE),
   cv=KFold(n_splits=5),
   n_jobs=-1,
)
nco.match(X_train)
ptf_nco = nco.predict(X_test); ptf_nco.identify = "Nested Clusters"


hrp.hierarchical_clustering_estimator_.plot_dendrogram().present()

We examine completely different danger measures, together with CVaR, semi-variance, CDaR, and most drawdown. We construct risk-budgeting portfolios to extra evenly distribute danger contributions throughout belongings. We additionally apply hierarchical strategies, corresponding to HRP and Nested Clusters Optimization, to seize asset relationships via clustering.

sturdy = MeanRisk(
   objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
   risk_measure=RiskMeasure.VARIANCE,
   prior_estimator=EmpiricalPrior(
       mu_estimator=ShrunkMu(),
       covariance_estimator=DenoiseCovariance(),
   ),
)
sturdy.match(X_train)
ptf_robust = sturdy.predict(X_test); ptf_robust.identify = "Max Sharpe (Robust)"


gerber = MeanRisk(
   risk_measure=RiskMeasure.VARIANCE,
   prior_estimator=EmpiricalPrior(covariance_estimator=GerberCovariance()),
)
gerber.match(X_train)
ptf_gerber = gerber.predict(X_test); ptf_gerber.identify = "Min Var (Gerber)"


belongings = listing(X_train.columns)


group_a = belongings[:10]; group_b = belongings[10:]
teams = pd.DataBody(
   {a: ["GroupA" if a in group_a else "GroupB"] for a in belongings},
   index=["Sector"],
)


constrained = MeanRisk(
   objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
   risk_measure=RiskMeasure.VARIANCE,
   min_weights=0.0,
   max_weights=0.20,
   transaction_costs=0.0005,
   teams=teams,
   linear_constraints=[
       "GroupA <= 0.6",
       "GroupB >= 0.2",
   ],
   l2_coef=0.01,
)
constrained.match(X_train)
ptf_constr = constrained.predict(X_test); ptf_constr.identify = "Constrained MV"
print("nConstrained portfolio weights:")
print(pd.Series(constrained.weights_, index=belongings).spherical(4))


print("nAvailable tickers:", listing(X_train.columns))


bl_views = [
   "AAPL == 0.0008",
   "JPM - BAC == 0.0002",
]
bl = MeanRisk(
   objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
   risk_measure=RiskMeasure.VARIANCE,
   prior_estimator=BlackLitterman(views=bl_views),
)
bl.match(X_train)
ptf_bl = bl.predict(X_test); ptf_bl.identify = "Black-Litterman"

We enhance portfolio stability by utilizing sturdy estimators corresponding to shrunk imply, denoised covariance, and Gerber covariance. We add real-world constraints like most asset weights, group limits, transaction prices, and L2 regularization. We additionally apply Black-Litterman views to mix market-based assumptions with our personal return expectations.

factor_prices = load_factors_dataset()
X_full, F_full = prices_to_returns(costs, factor_prices)


X_tr, X_te, F_tr, F_te = train_test_split(
   X_full, F_full, test_size=0.33, shuffle=False
)


fm = MeanRisk(
   objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
   risk_measure=RiskMeasure.VARIANCE,
   prior_estimator=FactorModel(),
)
fm.match(X_tr, F_tr)
ptf_fm = fm.predict(X_te); ptf_fm.identify = "Factor Model"
print(f"nFactor-model Sharpe: {ptf_fm.annualized_sharpe_ratio:.3f}")


pipe = Pipeline([
   ("preselect",   SelectKExtremes(k=8, highest=True)),
   ("optimize",    MeanRisk(
       objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
       risk_measure=RiskMeasure.VARIANCE)),
])
pipe.match(X_train)
ptf_pipe = pipe.predict(X_test); ptf_pipe.identify = "Top-8 + Max Sharpe"


wf_model = MeanRisk(
   objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
   risk_measure=RiskMeasure.VARIANCE,
)
mp_portfolio = cross_val_predict(
   wf_model, X,
   cv=WalkForward(train_size=252*2, test_size=63),
   n_jobs=-1,
)
mp_portfolio.identify = "Walk-Forward Max Sharpe"
print(f"nWalk-forward portfolio  Sharpe={mp_portfolio.annualized_sharpe_ratio:.3f}  "
     f"CalmarRatio={mp_portfolio.calmar_ratio:.3f}")
mp_portfolio.plot_cumulative_returns().present()


tuned = MeanRisk(
   objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
   risk_measure=RiskMeasure.VARIANCE,
   prior_estimator=EmpiricalPrior(mu_estimator=EWMu(alpha=0.1)),
)
grid = GridSearchCV(
   estimator=tuned,
   cv=WalkForward(train_size=252*2, test_size=63),
   n_jobs=-1,
   param_grid={
       "l2_coef": [0.0, 0.01, 0.1],
       "prior_estimator__mu_estimator__alpha": [0.05, 0.1, 0.2, 0.5],
   },
)
grid.match(X_train)
print("nBest params:", grid.best_params_)
print(f"Best CV rating (Sharpe): {grid.best_score_:.3f}")


ptf_tuned = grid.best_estimator_.predict(X_test); ptf_tuned.identify = "Tuned Max Sharpe"


remaining = Population([
   *baseline_population,
   ptf_min_var, ptf_max_sharpe,
   ptf_rb_var, ptf_rb_cvar,
   ptf_hrp, ptf_nco,
   ptf_robust, ptf_gerber,
   ptf_constr, ptf_bl, ptf_fm,
   ptf_pipe, ptf_tuned,
])


_full = remaining.abstract()
_wanted_final = [
   "Annualized Mean", "Annualized Standard Deviation",
   "Annualized Sharpe Ratio", "Annualized Sortino Ratio",
   "CVaR at 95%", "Maximum Drawdown", "Max Drawdown",
]
_have_final = [r for r in _wanted_final if r in _full.index]
abstract = _full.loc[_have_final].T.sort_values(
   "Annualized Sharpe Ratio", ascending=False
)


print("n" + "=" * 80)
print("FINAL HORSE RACE — sorted by Sharpe (out-of-sample take a look at set)")
print("=" * 80)
print(abstract.to_string())


remaining.plot_cumulative_returns().present()


remaining.plot_composition().present()


ptf_rb_var.plot_contribution(measure=RiskMeasure.VARIANCE).present()


print("nDone. Try swapping danger measures, including constraints, or wiring in")
print("your personal returns DataBody — each estimator follows the sklearn API.")

We construct an element mannequin to clarify asset returns utilizing exterior issue knowledge and optimize based mostly on that construction. We create a pre-selection pipeline, run walk-forward validation, and tune hyperparameters utilizing GridSearchCV. We lastly examine all portfolio methods in a single horse race utilizing abstract metrics, cumulative returns, composition, and risk-contribution plots.

In conclusion, we accomplished a full portfolio optimization workflow utilizing Skfolio, transferring from primary benchmark methods to superior model-driven portfolio development strategies. We in contrast equal-weighted, inverse-volatility, mean-variance, risk-parity, hierarchical, sturdy, constrained, Black-Litterman, factor-based, and tuned portfolios on an out-of-sample take a look at set. By utilizing skfolio’s scikit-learn fashion API, we saved the workflow modular, readable, and straightforward to prolong with new constraints, danger measures, estimators, or customized return knowledge.


Check out the Full Codes hereAlso, be at liberty to observe us on Twitter and don’t neglect to be part of our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to companion with us for selling your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar and so on.? Connect with us

The submit A Coding Implementation to Portfolio Optimization with skfolio for Building Testing, Tuning, and Comparing Modern Investment Strategies appeared first on MarkTechPost.

Similar Posts