This is a set of RooFit/RooStats tutorials given at INFN school of statistics (3-7 June 2013) , see the school agenda.
The required materials used during the RooFit/RooStats tutorials is described. The materials consist of exercises that can be solved from what has been presented in the lecture. Two levels of help are provided: a hint and a full solution.
You can find the single code attached (links at the end of the page) or you can use the provided tar file.
Some points linked to the exercises are optional and marked with a icon: they are useful to scrutinise in more detail some aspects. Try to solve or look at them if you have the time.
Additional material can be found on:
$ROOTSYS/tutorials/roofit
and $ROOTSYS/tutorials/roostats
.
A few advice words before starting. The tutorials are based on the ROOT version 5.34, possibly the latest one, 5.34.07 which is also installed in the Virtual Machine available for the school.
./configure --enable-roofit
). You can check if roofit is installed in your version of ROOT by typing the command root-config --has-roofit
. The answer must be yes
. If not, you must re-configure ROOT and rebuild it or install a new binary version. See again the computer setup instructions for a detailed description on how to install and configure ROOT.
root> .x myMacro.C // interpreted root> .x myMacro.C+ // compiled
RooStats
namespace):
using namespace RooFit; using namespace RooStats;
RooMsgService::instance().setGlobalKillBelow(RooFit::WARNING) ;
Some links to documentation:
$ROOTSYS/tutorials
The aim of the RooFit exercises is to learn how to build a RooFit model, how to fit the model on a set of data that we will generate and how to plot the result. We will also learn how to save the full RooFit workspace containing the model and data in a file which we can read back later, for example when we will do the RooStats exercises. We will learn also how to generate the data from the model. The same generated data we will then use to fit the model
Using the snippets of code, one can build a full ROOT macro which can be run interactively in ROOT or compiled using ACLIC.
We will start with a very simple exercise. If you already know a bit of RooFit you might skip this exercise and go directly to the next one. The aim of this exercise is to show and make you practise the basic functionality of RooFit.
We will create a Gaussian model from which we will generate a pseudo-data set and then we will fit this data set. After fitting, we will plot the result. You can find the instruction for each steps in the lecture slides.
generate(...)
method of the RooAbsPdf
class to generate a data set with 1000 events.
RooPlot
in a ROOT Canvas.
fitTo(...)
method of the RooAbsPdf
class
RooWorkspace
factory to create first the Gaussian p.d.f. function with the variables (observables and parameters).
Roo
suffix. The exact signature required (variables to be passed) is defined in the Pdf class constructor (You can browse the reference documentation here.
// create Gaussian Pdf RooWorkspace w("w"); w.factory("Gaussian:pdf(x[-10,10],mu[1,-1000,1000],s[2,0,1000])"); |
RooWorkspace::var
for variables or RooWorkspace::pdf
for p.d.f. Example:
// retrieving model components RooAbsPdf * pdf = w.pdf("pdf"); // access object from workspace RooRealVar * x = w.var("x"); // access object from workspace |
x
) and the number of events
// generate an unbinned dataset of 1000 events RooDataSet * data = pdf->generate( *x, 1000); |
// create a RooPlot object and plot the data RooPlot * plot = x->frame(); data->plotOn(plot); |
RooAbsPdf::fitTo
.
// fit the data and save its result. Use the optional =Minuit2= minimiser RooFitResult * res = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(true) ); |
pdf->plotOn(plot); plot->Draw(); // to show the RooPlot in the current ROOT Canvas |
// make a simple Gaussian model #include "RooWorkspace.h" #include "RooRealVar.h" #include "RooAbsPdf.h" #include "RooDataSet.h" #include "RooPlot.h" #include "TCanvas.h" using namespace RooFit; void GaussianModel(int n = 1000) { RooWorkspace w("w"); // define a Gaussian pdf w.factory("Gaussian:pdf(x[-10,10],mu[1,-1000,1000],sigma[2,0,1000])"); RooAbsPdf * pdf = w.pdf("pdf"); // access object from workspace RooRealVar * x = w.var("x"); // access object from workspace // generate n gaussian measurement for a Gaussian(x, 1, 1); RooDataSet * data = pdf->generate( *x, n); data->SetName("data"); // RooFit plotting capabilities RooPlot * pl = x->frame(Title("Gaussian Fit")); data->plotOn(pl); pl->Draw(); // now fit the data set pdf->fitTo(*data, Minimizer("Minuit2","Migrad") ); // plot the pdf on the same RooPlot object we have plotted the data pdf->plotOn(pl); pdf->paramOn(pl, Layout(0.6,0.9,0.85)); pl->Draw(); // import data in workspace (IMPORTANT for saving it ) w.import(*data); w.Print(); // write workspace in the file (recreate file if already existing) w.writeToFile("GaussianModel.root", true); cout << "model written to file " << endl; } |
The fit has only two shape parameters. How can you also add the a normalisation fit parameter, the number of events ?
RooExtendPdf
class. This can be done as following:
w.factory("Gaussian:gaus(x[-10,10],mu[1,-1000,1000],sigma[2,0,1000])"); w.factory("ExtendPdf:pdf(gaus, nevt[100,0,100000])"); |
The fit is an unbinned likelihood fit. How can you make a binned likelihood fit ?
generateBinned
which will return a RooDataHist
class or by using generate
with the option AllBinned
. In this last case a RooDataSet
class is returned.
The aim of tis exercise is to learn how to build a composite model in RooFit made of two p.d.f, one representing a signal and one a background distributions. We want to determine the number of signal events. For this we need to perform an extended maximum likelihood fit, where the signal events is one of the fit parameter. We will create the composite model formed by a Gaussian signal over a falling exponential background. We will have ~ 100 Gaussian-distributed signal event over ~ 1000 exponential-distributed background. We assume that the location and width of the signal are known. The "mass" observable is distributed in range [0 - 10] and the signal has mean ~ 2 and sigma 0.3. The exponential decay of the background is 2.
We will follow the same step as in the previous exercise:
RooAddPdf
class.
generate(...)
method of the RooAbsPdf
class to generate a data set with 1000 events.
RooPlot
in a ROOT Canvas.
fitTo(...)
method of the RooAbsPdf
class
RooWorkspace
factory to create first the Gaussian p.d.f. function with the variables (observables and parameters) and the Exponential
RooAddPdf
using the Gaussian and the Exponential and the relative number of events. Note that we could instead of the number of events a relative fraction. In this last case we would fit only for the shape and not the normalisation.
RooAddPdf
is created using the SUM
operator of the factory syntax.
// create model RooWorkspace w("w"); w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-2,-0.2])"); w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])"); // create RooAddPdf w.factory("SUM:model(nsig[100,0,10000]*sig_pdf, nbkg[1000,0,10000]*bkg_pdf)"); // for extended model |
RooWorkspace::var
for variables or RooWorkspace::pdf
for p.d.f. Example:
// retrieving model components RooAbsPdf * pdf = w.pdf("pdf"); // access object from workspace RooRealVar * x = w.var("x"); // access object from workspace |
generate
function passing the observables we want to generate and the number of events. In case of an extended pdf (as the RooAddPdf
we have created), if we don't pass the number of events, the number of events will be generated according to a Poisson distribution given the expected events (nsig+nbkg
).
// generate an unbinned dataset of 1000 events RooDataSet * data = pdf->generate( *x, 1000); |
// create a RooPlot object and plot the data RooPlot * plot = x->frame(); data->plotOn(plot); |
RooAbsPdf::fitTo
.
// fit the data and save its result. Use the optional =Minuit2= minimiser RooFitResult * res = pdf->fitTo(*data, Minimizer("Minuit2","Migrad"), Save(true) ); |
pdf->plotOn(plot); //draw the two separate pdf's pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) ); pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) ); plot->Draw(); // to show the RooPlot in the current ROOT Canvas |
w.writeToFile("GausExpModel.root",true); // true means the file is re-created |
ModelConfig
object that we save in the workspace. This will be useful for the
RooStats exercises (see later the RooStats exercise 1)
#include "RooWorkspace.h" #include "RooAbsPdf.h" #include "RooDataSet.h" #include "RooFitResult.h" #include "RooPlot.h" #include "RooRealVar.h" #include "RooRandom.h" #include "RooStats/ModelConfig.h" using namespace RooFit; void GausExpModel(int nsig = 100, // number of signal events int nbkg = 1000 ) // number of background events { RooWorkspace w("w"); w.factory("Exponential:bkg_pdf(x[0,10], a[-0.5,-2,-0.2])"); w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])"); w.factory("SUM:model(nsig[0,10000]*sig_pdf, nbkg[0,10000]*bkg_pdf)"); // for extended model RooAbsPdf * pdf = w.pdf("model"); RooRealVar * x = w.var("x"); // the observable // set the desired value of signal and background events w.var("nsig")->setVal(nsig); w.var("nbkg")->setVal(nbkg); // generate the data // use fixed random numbers for reproducibility (use 0 for changing every time) RooRandom::randomGenerator()->SetSeed(111); // fix number of bins to 50 to plot or to generate data (default is 100 bins) x->setBins(50); RooDataSet * data = pdf->generate( *x); // will generate accordint to total S+B events //RooDataSet * data = pdf->generate( *x, AllBinned()); // will generate accordint to total S+B events data->SetName("data"); w.import(*data); data->Print(); RooPlot * plot = x->frame(Title("Gaussian Signal over Exponential Background")); data->plotOn(plot); plot->Draw(); RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad")); r->Print(); pdf->plotOn(plot); //draw the two separate pdf's pdf->plotOn(plot, RooFit::Components("bkg_pdf"), RooFit::LineStyle(kDashed) ); pdf->plotOn(plot, RooFit::Components("sig_pdf"), RooFit::LineColor(kRed), RooFit::LineStyle(kDashed) ); pdf->paramOn(plot,Layout(0.5,0.9,0.85)); plot->Draw(); RooStats::ModelConfig mc("ModelConfig",&w); mc.SetPdf(*pdf); mc.SetParametersOfInterest(*w.var("nsig")); mc.SetObservables(*w.var("x")); // define set of nuisance parameters w.defineSet("nuisParams","a,nbkg"); mc.SetNuisanceParameters(*w.set("nuisParams")); // import model in the workspace w.import(mc); // write the workspace in the file TString fileName = "GausExpModel.root"; w.writeToFile(fileName,true); cout << "model written to file " << fileName << endl; } |
In this exercise we will create a combined model based on two channels and we will perform a simultaneous fit on both channels. We will learn how to create a RooSimultaneous
p.d.f. and a combined data set, which can be fit for one (or more) common parameters.
mu
expressing the signal strength.
We will so the same steps as before:
RooSimultaneous
).
mu
: the number of signal events is the product of mu
with a fixed number (50 for the first channel and 30 for the second).
RooWorkspace w("w"); w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])"); w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])"); w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])"); // express number of events as the w.factory("prod:nsig1(mu[1,0,5],xsec1[50])"); w.factory("prod:nsig2(mu,xsec2[30])"); w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)"); // for extended model w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)"); // for extended model |
RooCategory
) * use the factory operator SIMUL
for creating the RooSimultaneous
pdf.
// Create discrete observable to label channels w.factory("index[channel1,channel2]"); // Create joint pdf (RooSimultaneous) w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)"); |
// generate combined data set RooDataSet * data = pdf->generate( RooArgSet(*x,*index)); |
RooPlot * plot1 = x->frame(); RooPlot * plot2 = x->frame(); data->plotOn(plot1,RooFit::Cut("index==index::channel1")); data->plotOn(plot2,RooFit::Cut("index==index::channel2")); |
fitTo
method.
// fit RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad")); r->Print(); // plotting (draw the two separate pdf's ) pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1")); pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2")); // draw the two plots in two separate pads: c1 = new TCanvas(); c1->Divide(1,2); c1->cd(1); plot1->Draw(); c1->cd(2); plot2->Draw(); |
Code solution:
using namespace RooFit; void CombinedModel() { RooWorkspace w("w"); w.factory("Exponential:bkg1_pdf(x[0,10], a1[-0.5,-2,-0.2])"); w.factory("Gaussian:sig_pdf(x, mass[2], sigma[0.3])"); w.factory("prod:nsig1(mu[1,0,5],xsec1[50])"); w.factory("SUM:model1(nsig1*sig_pdf, nbkg1[1000,0,10000]*bkg1_pdf)"); // for extended model w.factory("Exponential:bkg2_pdf(x, a2[-0.25,-2,-0.2])"); w.factory("prod:nsig2(mu,xsec2[30])"); w.factory("SUM:model2(nsig2*sig_pdf, nbkg2[100,0,10000]*bkg2_pdf)"); // for extended model // Create discrete observable to label channels w.factory("index[channel1,channel2]"); // Create joint pdf (RooSimultaneous) w.factory("SIMUL:jointModel(index,channel1=model1,channel2=model2)"); RooAbsPdf * pdf = w.pdf("jointModel"); RooRealVar * x = w.var("x"); // the observable RooCategory * index = w.cat("index"); // the category // generate the data (since it is exetended we can generate the global model // use fixed random numbers for reproducibility // use 0 for changing every time RooRandom::randomGenerator()->SetSeed(111); // generate binned // plot the generate data in 50 bins (default is 100) x->setBins(50); // generate events of joint model // NB need to add also category as observable RooDataSet * data = pdf->generate( RooArgSet(*x,*index)); // will generate accordint to total S+B events data->SetName("data"); w.import(*data); data->Print(); RooPlot * plot1 = x->frame(Title("Channel 1")); RooPlot * plot2 = x->frame(Title("Channel 2")); data->plotOn(plot1,RooFit::Cut("index==index::channel1")); data->plotOn(plot2,RooFit::Cut("index==index::channel2")); // plot->Draw(); RooFitResult * r = pdf->fitTo(*data, RooFit::Save(true), RooFit::Minimizer("Minuit2","Migrad")); r->Print(); pdf->plotOn(plot1,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel1")); pdf->plotOn(plot2,RooFit::ProjWData(*data),RooFit::Slice(*w.cat("index"),"channel2")); //draw the two separate pdf's pdf->paramOn(plot1,Layout(0.65,0.85,0.85),Parameters(RooArgSet(*w.var("a1"),*w.var("nbkg1")))); pdf->paramOn(plot2,Layout(0.65,0.85,0.85),Parameters(RooArgSet(*w.var("a2"),*w.var("nbkg2")))); pdf->paramOn(plot2,Layout(0.65,0.85,0.7),Parameters(RooArgSet(*w.var("mu")))); c1 = new TCanvas(); c1->Divide(1,2); c1->cd(1); plot1->Draw(); c1->cd(2); plot2->Draw(); RooStats::ModelConfig mc("ModelConfig",&w); mc.SetPdf(*pdf); mc.SetParametersOfInterest(*w.var("mu")); mc.SetObservables(RooArgSet(*w.var("x"),*w.cat("index"))); // define set of nuisance parameters w.defineSet("nuisParams","a1,nbkg1,a2,nbkg2"); mc.SetNuisanceParameters(*w.set("nuisParams")); // set also the snapshot mc.SetSnapshot(*w.var("mu")); // import model in the workspace w.import(mc); // write the workspace in the file TString fileName = "CombinedModel.root"; w.writeToFile(fileName,true); cout << "model written to file " << fileName << endl; } |
In these exercises we will learn to use the RooStats calculators to perform a statistical analysis on a model. We will use two types of models:
For these model we compute:
For running the RooStats calculators we read the workspace from the ROOT file. All the Roostats calculators take as input a data set and a ModelConfig object. It is therefore convenient to save the workspace (with model and data) in a ROOT file and then read it afterwards for running the calculators.
If you have problem creating the code for running the calculators, you can run the provided solution or just the standard Roostats tutorials (in $ROOTSYS/tutorials/roostats)
Tutorials you could use are:
StandardProfileLikelihoodDemo.C
StandardBayesianNumericalDemo.C
StandardBayesianMCMCDemo.C
They can be run as following from the Root prompt by passing the workspace file name (e.g. GausExpModel.root ), workspace name, model config name and data set name:
root [0] .L $ROOTSYS/tutorials/roostats/StandardProfileLikelihoodDemo.C root [1] StandardProfileLikelihoodDemo("SPlusBExpoModel.root", "w", "ModelConfig", "data" )
Other tutorials go into more details to create various models and run the calculators. See them all at http://root.cern.ch/root/html/tutorials/roostats/index.html.
ModelConfig
from the Gaussian plus Exponential Model used in the RooFit Exercise 2.
We show first how to create the ModelConfig
object which we need to import into the workspace in order to be able to run later the RooStats calculators.
We need to specify in the ModelConfig
:
We will see later, that if we have a model where some external knowledge on a parameter is introduced, we would need to define the model global observables.
Below is how we create a ModelConfig
for the Gaussian signal plus Exponential background model:
RooStats::ModelConfig mc("ModelConfig",&w); mc.SetPdf(*pdf); mc.SetParametersOfInterest(*w.var("nsig")); mc.SetObservables(*w.var("x")); // define set of nuisance parameters w.defineSet("nuisParams","a,nbkg"); mc.SetNuisanceParameters(*w.set("nuisParams")); // import the ModelConfig in the workspace w.import(mc); // write workspace in the file w.writeToFile("GausExpModel.root"); |
In this exercise we will see how to run the RooStats Profile Likelihood calculator on the Gaussian signal plus Exponential background model. We will create a ROOT macro, doing the following:
ModelConfig
and data set objects.
using namespace RooStats;
.
using namespace RooStats; void SimplePL( const char* infile = "GausExpModel.root", const char* workspaceName = "w", const char* modelConfigName = "ModelConfig", const char* dataName = "data" ) { // open input file TFile *file = TFile::Open(infile); if (!file) return; // get the workspace out of the file RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName); // get the modelConfig out of the file RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName); // get the modelConfig out of the file RooAbsData* data = w->data(dataName); |
ProfileLikelihoodCalculator class passing the =ModelConfig
object and a data set object (RooAbsData
):
ProfileLikelihoodCalculator pl(*data,*mc); |
GetInterval(…)
:
pl.SetConfidenceLevel(0.683); // 68% interval LikelihoodInterval* interval = pl.GetInterval(); |
// find the iterval on the first Parameter of Interest RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first(); double lowerLimit = interval->LowerLimit(*firstPOI); double upperLimit = interval->UpperLimit(*firstPOI); cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<< lowerLimit << ", "<< upperLimit <<"] "<<endl; |
LikelihoodIntervalPlot
class:
LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval); plot->SetRange(0,150); // change range to have interval visible (default range is variable range) plot->Draw(); |
plot->SetPoints(50); plot->Draw("tf1"); |
using namespace RooStats; void SimplePL( const char* infile = "GausExpModel.root", const char* workspaceName = "w", const char* modelConfigName = "ModelConfig", const char* dataName = "data" ) { ///////////////////////////////////////////////////////////// // First part is just to access the workspace file //////////////////////////////////////////////////////////// // open input file TFile *file = TFile::Open(infile); if (!file) return; // get the workspace out of the file RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName); // get the modelConfig out of the file RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName); // get the modelConfig out of the file RooAbsData* data = w->data(dataName); ProfileLikelihoodCalculator pl(*data,*mc); pl.SetConfidenceLevel(0.683); // 68% interval LikelihoodInterval* interval = pl.GetInterval(); // find the iterval on the first Parameter of Interest RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first(); double lowerLimit = interval->LowerLimit(*firstPOI); double upperLimit = interval->UpperLimit(*firstPOI); cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<< lowerLimit << ", "<< upperLimit <<"] "<<endl; LikelihoodIntervalPlot * plot = new LikelihoodIntervalPlot(interval); plot->SetRange(0,150); // possible eventually to change ranges //plot->SetNPoints(50); // do not use too many points, it could become very slow for some models plot->Draw(""); // use option TF1 if too slow (plot.Draw("tf1") } |
68% interval on nsig is : [60.3382, 97.0782]
In this exercise we will learn to use the RooStats Bayesian calculators. We have the choice between using the MCMC calculator or the numerical one. We will start using the MCMC and optionally in the next exercise we will show how to use the numerical Bayesian calculator. We will use the same model used before for the Profile Likelihood calculator. What we need to do is very similar to the previous exercise:
MCMCCalculator
class
GetInterval
function to compute the desired interval. The returned MCMCInterval
class will contained the resulting Markov-Chain.
MCMCIntervalPlot
class
MCMCCalculator class passing the =ModelConfig
object and a data set object (RooAbsData
) exactly as we did for the ProfileLikelihoodCalculator
:
MCMCCalculator mcmc(*data,*mc); mcmc.SetConfidenceLevel(0.683); // 683% interval |
SequentialProposal
function. This function samples each parameter independently using a gaussian pdf with a sigma given by a fraction of the parameter range. We use in this case a value of 0.1 for the fraction. It is important to check that the acceptance of the proposal is not too low. In that case we are wasting iterations.
// Define proposal function. This proposal function seems fairly robust SequentialProposal sp(0.1); mcmc.SetProposalFunction(sp); |
// set number of iterations and initial burning steps mcmc.SetNumIters(100000); // Metropolis-Hastings algorithm iterations mcmc.SetNumBurnInSteps(1000); // first N steps to be ignored as burn-in |
mcmc.SetLeftSideTailFraction(0.5); // 0.5 for central Bayesian interval and 0 for upper limit |
MCMCInterval* interval = mcmc.GetInterval(); // print out the iterval on the first Parameter of Interest RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first(); cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<< interval->LowerLimit(*firstPOI) << ", "<< interval->UpperLimit(*firstPOI) <<"] "<< endl; |
MCMCIntervalPlot
class:
// make a plot of posterior function new TCanvas("IntervalPlot"); MCMCIntervalPlot plot(*interval); plot.Draw(); |
using namespace RooFit; using namespace RooStats; void SimpleMCMC( const char* infile = "GausExpModel.root", const char* workspaceName = "w", const char* modelConfigName = "ModelConfig", const char* dataName = "data" ) { ///////////////////////////////////////////////////////////// // First part is just to access the workspace file //////////////////////////////////////////////////////////// // open input file TFile *file = TFile::Open(infile); if (!file) return; // get the workspace out of the file RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName); // get the modelConfig out of the file RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName); // get the modelConfig out of the file RooAbsData* data = w->data(dataName); RooRealVar * nsig = w->var("nsig"); if (nsig) nsig->setRange(0,200); MCMCCalculator mcmc(*data,*mc); mcmc.SetConfidenceLevel(0.683); // 68.3% interval // Define proposal function. This proposal function seems fairly robust SequentialProposal sp(0.1); mcmc.SetProposalFunction(sp); // set number of iterations and initial burning steps mcmc.SetNumIters(100000); // Metropolis-Hastings algorithm iterations mcmc.SetNumBurnInSteps(1000); // first N steps to be ignored as burn-in // default is the shortest interval // here we use central interval mcmc.SetLeftSideTailFraction(0.5); // for central Bayesian interval //mcmc.SetLeftSideTailFraction(0); // for one-sided Bayesian interval // run the calculator MCMCInterval* interval = mcmc.GetInterval(); // print out the iterval on the first Parameter of Interest RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first(); cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<< interval->LowerLimit(*firstPOI) << ", "<< interval->UpperLimit(*firstPOI) <<"] "<<endl; // make a plot of posterior function new TCanvas("IntervalPlot"); MCMCIntervalPlot plot(*interval); plot.Draw(); } |
Metropolis-Hastings progress: .................................................................................................... [#1] INFO:Eval -- Proposal acceptance rate: 10.469% [#1] INFO:Eval -- Number of steps in chain: 10469 68% interval on nsig is : [61.245, 98.2044]
We can try now to use also the numerical Bayesian calculator on the same model. As before we create the class and configure it setting for example the integration type or the number of iterations. Since the numerical integration can be tricky, it is better in this case to set before a reasonable range from the nuisance parameters, which will be integrated to obtain the marginalised posterior function. A sensible choice is to use for example an interval using 10-sigma around the best fit values for the nuisance parameters.
Note that we have not defined a prior distribution for the parameter of interest (the number of signal events). If a prior is not defined in the model, it is assumed that the prior distribution for the parameter of interest is the uniform distribution in the parameter range.
BayesianCalculator class passing the =ModelConfig
object and a data set object, RooAbsData
:
BayesianCalculator bayesianCalc(*data,*mc); |
// define reasanable range for nuisance parameters ( +/- 10 sigma around fitted values) to facilitate the integral RooArgList nuisPar(*mc->GetNuisanceParameters()); for (int i = 0; i < nuisPar.getSize(); ++i) { RooRealVar & par = (RooRealVar&) nuisPar[i]; par.setRange(par.getVal()-10*par.getError(), par.getVal()+10*par.getError()); } |
We can then configure the calculator:
// set the type of interval (not really needed for central which is the default) bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval //bayesianCalc.SetShortestInterval(); // for shortest interval bayesianCalc.SetConfidenceLevel(0.683); // 68% interval // set the integration type (default is "ADAPTIVE") // possible alternative values are "VEGAS" , "MISER", or "PLAIN" but they require libMathMore // or "TOYMC" (toy MC integration, work when nuisances exist and they have a constraints pdf) bayesianCalc.SetIntegrationType("TOYMC"); // limit the number of iterations to speed-up the integral bayesianCalc.SetNumIters(1000); // compute interval by scanning the posterior function // it is done by default when computing shortest intervals bayesianCalc.SetScanOfPosterior(50); |
RooPlot
object:
SimpleInterval* interval = bayesianCalc.GetInterval(); if (!interval) { cout << "Error computing Bayesian interval - exit " << endl; return; } double lowerLimit = interval->LowerLimit(); double upperLimit = interval->UpperLimit(); RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first(); cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<< lowerLimit << ", "<< upperLimit <<"] "<<endl; // draw plot of posterior function RooPlot * plot = bayesianCalc.GetPosteriorPlot(); if (plot) plot->Draw(); |
using namespace RooStats; void SimpleBayes( const char* infile = "GausExpModel.root", const char* workspaceName = "w", const char* modelConfigName = "ModelConfig", const char* dataName = "data" ) { ///////////////////////////////////////////////////////////// // First part is just to access the workspace file //////////////////////////////////////////////////////////// // open input file TFile *file = TFile::Open(infile); if (!file) return; // get the workspace out of the file RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName); // get the modelConfig out of the file RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName); // get the modelConfig out of the file RooAbsData* data = w->data(dataName); RooRealVar * nsig = w->var("nsig"); if (nsig) nsig->setRange(0,200); // define reasanable range for nuisance parameters ( +/- 10 sigma around fitted values) // to facilitate the integral RooArgList nuisPar(*mc->GetNuisanceParameters()); for (int i = 0; i < nuisPar.getSize(); ++i) { RooRealVar & par = (RooRealVar&) nuisPar[i]; par.setRange(par.getVal()-10*par.getError(), par.getVal()+10*par.getError()); } BayesianCalculator bayesianCalc(*data,*mc); bayesianCalc.SetConfidenceLevel(0.683); // 68% interval // set the type of interval (not really needed for central which is the default) bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval //bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit //bayesianCalc.SetShortestInterval(); // for shortest interval // set the integration type (not really needed for the default ADAPTIVE) // possible alternative values are "VEGAS" , "MISER", or "PLAIN" (MC integration from libMathMore) // "TOYMC" (toy MC integration, work when nuisances exist and they have a constraints pdf) TString integrationType = ""; // this is needed if using TOYMC if (integrationType.Contains("TOYMC") ) { RooAbsPdf * nuisPdf = RooStats::MakeNuisancePdf(*mc, "nuisance_pdf"); if (nuisPdf) bayesianCalc.ForceNuisancePdf(*nuisPdf); } // limit the number of iterations to speed-up the integral bayesianCalc.SetNumIters(1000); bayesianCalc.SetIntegrationType(integrationType); // compute interval by scanning the posterior function // it is done by default when computing shortest intervals bayesianCalc.SetScanOfPosterior(50); RooRealVar* firstPOI = (RooRealVar*) mc->GetParametersOfInterest()->first(); SimpleInterval* interval = bayesianCalc.GetInterval(); if (!interval) { cout << "Error computing Bayesian interval - exit " << endl; return; } double lowerLimit = interval->LowerLimit(); double upperLimit = interval->UpperLimit(); cout << "\n68% interval on " <<firstPOI->GetName()<<" is : ["<< lowerLimit << ", "<< upperLimit <<"] "<<endl; // draw plot of posterior function RooPlot * plot = bayesianCalc.GetPosteriorPlot(); if (plot) plot->Draw(); } |
[#1] INFO:Eval -- BayesianCalculator::GetInterval - found a valid interval : [61.01 , 97.9247 ] 68% interval on nsig is : [61.01, 97.9247]
We create now a counting model based on Poisson Statistics. Let's suppose we observe nobs
events when we expect nexp
, where nexp = s+b
(s
is the number of signal events and b
is the number of background events. The expected distribution for nexp
is a Poisson : Poisson( nobs | s+b)
.
s
is the parameter we want to estimate or set a limit (the parameter of interest), while b
is the nuisance parameter. b
is not known exactly, but has uncertainty sigmab
. The nominal value of b
is = b0=. To express this uncertainty we add in the model a Gaussian constraint. We can interpret this term as having an additional measurement b0
with an uncertainty sigmab
: i.e. we have a likelihood for that measurement =Gaussian( b0 | b, sigma)
. In case of Bayesian statistics we can interpret the constraint as a prior knowledge on the parameter b.
nobs=3
, b0=1
, sigmab=0.2
.
ModelConfig
object and import in the workspace. We need to add in the ModelConfig
also b0
as a "global observable=. The reason is that b0
needs to be treated as an auxiliary observable in case of frequentist statistics and varied when tossing pseudo-experiments.
nobs=3
, b0=1
, sigmab=0.2
.
int nobs = 3; // number of observed events double b0 = 1; // number of background events double sigmab = 0.2; // relative uncertainty in b RooWorkspace w("w"); // make Poisson model * Gaussian constraint w.factory("sum:nexp(s[3,0,15],b[1,0,10])"); // Poisson of (n | s+b) w.factory("Poisson:pdf(nobs[0,50],nexp)"); // Gaussian constraint to express uncertainty in b w.factory("Gaussian:constraint(b0[1,0,10],b,sigmab[0.2])"); // the total model p.d.f will be the product of the two w.factory("PROD:model(pdf,constraint)"); |
b0
, the global observable, as a constant variable (this is needed when we fit the model). However, b0
often needs to have a range.
w.var("b0")->setConstant(true);
After creating the model p.d.f we create the ModelConfig
object and we set b0
as the "global observable=
ModelConfig mc("ModelConfig",&w); mc.SetPdf(*w.pdf("model")); mc.SetParametersOfInterest(*w.var("s")); mc.SetObservables(*w.var("nobs")); mc.SetNuisanceParameters(*w.var("b")); // need now to set the global observable mc.SetGlobalObservables(*w.var("b0")); // this is needed for the hypothesis tests mc.SetSnapshot(*w.var("s")); // import model in the workspace w.import(mc); |
We generate then an hypothetical observed data set. Since we have a counting model, the data set will contain only one event and the observable nobs
will have our desired value (i.e. 3).
Remember to import the data set in the workspace and then save the workspace in a ROOT file.
// make data set with the namber of observed events RooDataSet data("data","", *w.var("nobs")); w.var("nobs")->setVal(3); data.add(*w.var("nobs") ); // import data set in workspace and save it in a file w.import(data); w.writeToFile("ComtingModel.root", true); |
using namespace RooFit; using namespace RooStats; void CountingModel( int nobs = 3, // number of observed events double b = 1, // number of background events double sigmab = 0.2 ) // relative uncertainty in b { RooWorkspace w("w"); // make Poisson model * Gaussian constraint w.factory("sum:nexp(s[3,0,15],b[1,0,10])"); // Poisson of (n | s+b) w.factory("Poisson:pdf(nobs[0,50],nexp)"); w.factory("Gaussian:constraint(b0[0,10],b,sigmab[1])"); w.factory("PROD:model(pdf,constraint)"); w.var("b0")->setVal(b); w.var("b0")->setConstant(true); // needed for being treated as global observables w.var("sigmab")->setVal(sigmab*b); ModelConfig mc("ModelConfig",&w); mc.SetPdf(*w.pdf("model")); mc.SetParametersOfInterest(*w.var("s")); mc.SetObservables(*w.var("nobs")); mc.SetNuisanceParameters(*w.var("b")); // these are needed for the hypothesis tests mc.SetSnapshot(*w.var("s")); mc.SetGlobalObservables(*w.var("b0")); mc.Print(); // import model in the workspace w.import(mc); // make data set with the namber of observed events RooDataSet data("data","", *w.var("nobs")); w.var("nobs")->setVal(3); data.add(*w.var("nobs") ); // import data set in workspace and save it in a file w.import(data); w.Print(); TString fileName = "CountingModel.root"; // write workspace in the file (recreate file if already existing) w.writeToFile(fileName, true); } |
Suppose you want to define the uncertainty in the background as log-normal. How do you do this ?
Suppose instead the background is estimated from another counting experiments. How do you model this ? (This model is often call on/off model)
You can look at this past RooStats exercise (RooStatsTutorialsAugust2012) for these solutions.
Using the previously defined Counting model we can compute a 95% upper limit on the signal parameter s
. We could use :
Suppose we use the Bayesian calculator:
SimpleBayes.C
. We just need to change these lines, setting the tip elf interval and give as input the right ROOT file containing the workspace.
//bayesianCalc.SetConfidenceLevel(0.68); // 68% interval bayesianCalc.SetConfidenceLevel(0.95); // 95% interval // set the type of interval //bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit |
This is the result we will obtain on the Bayesian calculator:
Try to use the Profile Likelihood Calculator. Which result do you obtain ?
We can try to use a different prior on the signal instead of the default uniform in the case of the Bayesian calculator. For example we can try the Jeffrey reference prior,
P(s) = 1./sqrt(s)
.
ModelConfig
so the Bayesian RooStats are aware of its existence:
// define as a pdf based on an expression w.factory("EXPR::prior_s('1./sqrt(s)',s)"); // define prior in ModelConfig mc.SetPriorPdf(*w.pdf("prior_s")); |
The aim of this exercise is to compute the significance of observing a signal in the Gaussian plus Exponential model. It is better to re-generate the model using a lower value for the number of signal events (e.g. 50), in order not to have a too high significance, which will then be difficult to estimate if we use pseudo-experiments.
To estimate the significance, we need to perform an hypothesis test. We want to disprove the null model, i.e the background only model against the alternate model, the background plus the signal.
In RooStats, we do this by defining two two ModelConfig
objects, one for the null model (the background only model in this case) and one for the alternate model (the signal plus background).
We have defined in the workspace saved in the ROOT file only the signal plus background model. We can create the background model from the signal plus background model by setting the value of nsig=0
. We do this by cloning the S+B model and by defining a snapshot in the ModelConfig
for nsig
with a different value:
ModelConfig* sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName); RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first(); // define the S+B snapshot (this is used for computing the expected significance) poi->setVal(50); sbModel->SetSnapshot(*poi); // create B only model ModelConfig * bModel = (ModelConfig*) sbModel->Clone(); bModel->SetName("B_only_model"); poi->setVal(0); bModel->SetSnapshot( *poi ); |
Using the given models plus the observed data set, we can create a RooStats hypothesis test calculator to compute the significance. We have the choice between
The first two require generate pseudo-experiment, while the Asymptotic calculator, requires only a fit to the two models. For the Frequentist and the Hybrid calculators we need to choose also the test statistics. We have various choices in RooStats. Since it is faster, we will use for the exercise the Asymptotic calculator. The Asymptotic calculator it is based on the Profile Likelihood test statistics, the one used at LHC (see slides for its definition). For testing the significance, we must use the one-side test statistic (we need to configure this explicitly in the class). Summarising, we will do:
AsymptoticCalculator
class using the two models and the data set;
GetHypoTest
function.
HypoTestResult
object
AsymptoticCalculator ac(*data, *sbModel, *bModel); ac.SetOneSidedDiscovery(true); // for one-side discovery test // this run the hypothesis test and print the result HypoTestResult * result = ac.GetHypoTest(); result->Print(); |
using namespace RooStats; using namespace RooFit; void SimpleHypoTest( const char* infile = "GausExpModel.root", const char* workspaceName = "w", const char* modelConfigName = "ModelConfig", const char* dataName = "data" ) { ///////////////////////////////////////////////////////////// // First part is just to access the workspace file //////////////////////////////////////////////////////////// // open input file TFile *file = TFile::Open(infile); if (!file) return; // get the workspace out of the file RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName); // get the data out of the file RooAbsData* data = w->data(dataName); // get the modelConfig (S+B) out of the file // and create the B model from the S+B model ModelConfig* sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName); sbModel->SetName("S+B Model"); RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first(); poi->setVal(50); // set POI snapshot in S+B model for expected significance sbModel->SetSnapshot(*poi); ModelConfig * bModel = (ModelConfig*) sbModel->Clone(); bModel->SetName("B Model"); poi->setVal(0); bModel->SetSnapshot( *poi ); // create the AsymptoticCalculator from data,alt model, null model AsymptoticCalculator ac(*data, *sbModel, *bModel); ac.SetOneSidedDiscovery(true); // for one-side discovery test //ac.SetPrintLevel(-1); // to suppress print level // run the calculator HypoTestResult * asResult = ac.GetHypoTest(); asResult->Print(); return; // comment this line if you want to run the FrequentistCalculator std::cout << "\n\nRun now FrequentistCalculator.....\n"; FrequentistCalculator fc(*data, *sbModel, *bModel); fc.SetToys(2000,500); // 2000 for null (B) and 500 for alt (S+B) // create the test statistics ProfileLikelihoodTestStat profll(*sbModel->GetPdf()); // use one-sided profile likelihood profll.SetOneSidedDiscovery(true); // configure ToyMCSampler and set the test statistics ToyMCSampler *toymcs = (ToyMCSampler*)fc.GetTestStatSampler(); toymcs->SetTestStatistic(&profll); if (!sbModel->GetPdf()->canBeExtended()) toymcs->SetNEventsPerToy(1); // run the test HypoTestResult * fqResult = fc.GetHypoTest(); fqResult->Print(); // plot test statistic distributions HypoTestPlot * plot = new HypoTestPlot(*fqResult); plot->SetLogYaxis(true); plot->Draw(); } |
Results HypoTestAsymptotic_result: - Null p-value = 0.00118838 - Significance = 3.0386 - CL_b: 0.00118838 - CL_s+b: 0.502432 - CL_s: 422.787
Run the Frequentist calculator
FrequentistCalculator fc(*data, *sbModel, *bModel); fc.SetToys(2000,500); // 2000 for null (B) and 500 for alt (S+B) // create the test statistics ProfileLikelihoodTestStat profll(*sbModel->GetPdf()); // use one-sided profile likelihood profll.SetOneSidedDiscovery(true); // configure ToyMCSampler and set the test statistics ToyMCSampler *toymcs = (ToyMCSampler*)fc.GetTestStatSampler(); toymcs->SetTestStatistic(&profll); if (!sbModel->GetPdf()->canBeExtended()) toymcs->SetNEventsPerToy(1); |
HypoTestResult * fqResult = fc.GetHypoTest(); fqResult->Print(); |
HypTestPlot
class for this.
// plot test statistic distributions HypoTestPlot * plot = new HypoTestPlot(*fqResult); plot->SetLogYaxis(); plot->Draw(); |
return
statement in the middle and run it again
Results HypoTestCalculator_result: - Null p-value = 0.001 +/- 0.000706753 - Significance = 3.09023 +/- 0.2099 sigma - Number of Alt toys: 500 - Number of Null toys: 2000 - Test statistic evaluated on data: 4.61654 - CL_b: 0.001 +/- 0.000706753 - CL_s+b: 0.496 +/- 0.02236 - CL_s: 496 +/- 351.262
As an optional exercise to lear more about computing significance in RooStats, we will study the significance (or equivalent the p-value) we obtain in our Gaussian signal plus exponential background model as function of the signal mass hypothesis.
For doing this we perform the test of hypothesis, using the AsymptoticCalculator, for several mass points. We can then plot the obtained p-value for the null hypothesis (p0
).
We will also estimate the expected significance for our given S+B model as function of its mass. The expected significance can be obtained using a dedicated function in the AsymptoticCalculator
:
AsymptoticCalculator::GetExpectedPValues
using as input the observed p-values for the null and the alternate models. See the Asymptotic formulae paper for the details.
using namespace RooStats; using namespace RooFit; void p0Plot() { const char* infile = "GausExpModel.root"; const char* workspaceName = "w"; const char* modelConfigName = "ModelConfig"; const char* dataName = "data"; ///////////////////////////////////////////////////////////// // First part is just to access the workspace file //////////////////////////////////////////////////////////// // Check if example input file exists TFile *file = TFile::Open(infile); // get the workspace out of the file RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName); // get the modelConfig out of the file RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName); // get the modelConfig out of the file RooAbsData* data = w->data(dataName); // get the modelConfig (S+B) out of the file // and create the B model from the S+B model ModelConfig* sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName); sbModel->SetName("S+B Model"); RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first(); poi->setVal(50); // set POI snapshot in S+B model for expected significance sbModel->SetSnapshot(*poi); ModelConfig * bModel = (ModelConfig*) sbModel->Clone(); bModel->SetName("B Model"); poi->setVal(0); bModel->SetSnapshot( *poi ); vector<double> masses; vector<double> p0values; vector<double> p0valuesExpected; double massMin = 0.5; double massMax = 8.5; // loop on the mass values for( double mass=massMin; mass<=massMax; mass += (massMax-massMin)/40.0 ) { cout << endl << endl << "Running for mass: " << mass << endl << endl; w->var("mass")->setVal( mass ); AsymptoticCalculator * ac = new AsymptoticCalculator(*data, *sbModel, *bModel); ac->SetOneSidedDiscovery(true); // for one-side discovery test AsymptoticCalculator::SetPrintLevel(-1); HypoTestResult* asymCalcResult = ac->GetHypoTest(); asymCalcResult->Print(); masses.push_back( mass ); p0values.push_back( asymCalcResult->NullPValue() ); double expectedP0 = AsymptoticCalculator::GetExpectedPValues( asymCalcResult->NullPValue(), asymCalcResult->AlternatePValue(), 0, false); p0valuesExpected.push_back( expectedP0 ); std::cout << "expected p0 = " << expectedP0 << std::endl; } TGraph * graph1 = new TGraph(masses.size(),&masses[0],&p0values[0]); TGraph * graph2 = new TGraph(masses.size(),&masses[0],&p0valuesExpected[0]); graph1->SetMarkerStyle(20); graph1->Draw("APC"); graph2->SetLineStyle(2); graph2->Draw("C"); graph1->GetXaxis()->SetTitle("mass"); graph1->GetYaxis()->SetTitle("p0 value"); graph1->SetTitle("Significance vs Mass"); graph1->SetMinimum(graph2->GetMinimum()); graph1->SetLineColor(kBlue); graph2->SetLineColor(kRed); gPad->SetLogy(true); } |
CLs
Limits). In our last exercise we will compute an upper limit using the frequentist hypothesis test inversion method. We need to perform the hypothesis test for different parameter of interest points and compute the corresponding p-values. Since we are interesting in computing a limit, the test null hypothesis, that we want to disprove, is the in this case the S+B model, while the alternate hypothesis is the B only model. It is important to remember this, when we construct the hypothesis test calculator.
For doing the inversion RooStats
provides an helper class, HypoTestInverter
, which is constructed using one of the HypoTest calculator (Frequentist, Hybrid or Asymptotic).
To compute the limit we need to do:
HypoTestInverter
class and configure it (e.g. to set the number and range of points to scan, to use CLs
for computing the limit, etc…)
GetInterval
which returns an HypoTestInverterResult
object.
AsymptotiCalculator
. If we use the Frequentist is the same, just we need to set also the test statistics and number of toys
// asymptotic calculator AsymptoticCalculator ac(*data, *bModel, *sbModel); ac.SetOneSided(true); // for one-side tests (limits) AsymptoticCalculator::SetPrintLevel(-1); // to avoid to much verbosity |
// create hypotest inverter // passing the desired calculator HypoTestInverter calc(ac); // set confidence level (e.g. 95% upper limits) calc.SetConfidenceLevel(0.95); // for CLS calc.UseCLs(true); |
We run now the Inverter:
int npoints = 10; // number of points to scan // min and max (better to choose smaller intervals) double poimin = poi->getMin(); double poimax = poi->getMax(); //poimin = 0; poimax=10; std::cout << "Doing a fixed scan in interval : " << poimin << " , " << poimax << std::endl; calc.SetFixedScan(npoints,poimin,poimax); // run inverter HypoTestInverterResult * r = calc.GetInterval(); |
We can then retrieve from the obtained result object the observed limit value and also the expected limit value and bands (median +/- 1 sigma bands):
double upperLimit = r->UpperLimit(); // estimate of the error on the upper limit double ulError = r->UpperLimitEstimatedError(); std::cout << "The computed upper limit is: " << upperLimit << " +/- " << ulError << std::endl; // compute expected limit and bands std::cout << "Expected upper limits, using the B (alternate) model : " << std::endl; std::cout << " expected limit (median) " << r->GetExpectedUpperLimit(0) << std::endl; std::cout << " expected limit (-1 sig) " << r->GetExpectedUpperLimit(-1) << std::endl; std::cout << " expected limit (+1 sig) " << r->GetExpectedUpperLimit(1) << std::endl; |
Plot now the result of the scan using the HypoTestInverterPlot
class:
HypoTestInverterPlot *plot = new HypoTestInverterPlot("HTI_Result_Plot","Feldman-Cousins Interval",r); // plot in a new canvas TCanvas * c1 = new TCanvas("HypoTestInverter Scan"); c1->SetLogy(false); plot->Draw("CLb 2CL"); // plot also CLb and CLs+b //plot->Draw("OBS"); // plot only observed p-value |
If we were using toys (e.g. FrequentistCalculator), we could plot in a different ROOT Canvas the test statistics distributions.
// plot test statistics distributions for the two hypothesis // when distribution is generated (case of FrequentistCalculators) const int n = r->ArraySize(); if (n> 0 && r->GetResult(0)->GetNullDistribution() ) { TCanvas * c2 = new TCanvas("Test Statistic Distributions","",2); if (n > 1) { int ny = TMath::CeilNint( sqrt(n) ); int nx = TMath::CeilNint(double(n)/ny); c2->Divide( nx,ny); } for (int i=0; i<n; i++) { if (n > 1) c2->cd(i+1); SamplingDistPlot * pl = plot->MakeTestStatPlot(i); pl->SetLogYaxis(true); pl->Draw(); } } |
using namespace RooStats; using namespace RooFit; void SimpleHypoTestInv( const char* infile = "CountingModel.root", const char* workspaceName = "w", const char* modelConfigName = "ModelConfig", const char* dataName = "data" ) { ///////////////////////////////////////////////////////////// // First part is just to access the workspace file //////////////////////////////////////////////////////////// // open input file TFile *file = TFile::Open(infile); if (!file) return; // get the workspace out of the file RooWorkspace* w = (RooWorkspace*) file->Get(workspaceName); // get the modelConfig out of the file RooStats::ModelConfig* mc = (RooStats::ModelConfig*) w->obj(modelConfigName); // get the modelConfig out of the file RooAbsData* data = w->data(dataName); ModelConfig* sbModel = (RooStats::ModelConfig*) w->obj(modelConfigName); RooRealVar* poi = (RooRealVar*) sbModel->GetParametersOfInterest()->first(); ModelConfig * bModel = (ModelConfig*) sbModel->Clone(); bModel->SetName(TString(sbModel->GetName())+TString("_with_poi_0")); poi->setVal(0); bModel->SetSnapshot( *poi ); FrequentistCalculator fc(*data, *bModel, *sbModel); fc.SetToys(1000,500); // asymptotic calculator AsymptoticCalculator ac(*data, *bModel, *sbModel); ac.SetOneSided(true); // for one-side tests (limits) // ac->SetQTilde(true); AsymptoticCalculator::SetPrintLevel(-1); // create hypotest inverter // passing the desired calculator HypoTestInverter calc(ac); // for asymptotic //HypoTestInverter calc(fc); // for frequentist // set confidence level (e.g. 95% upper limits) calc.SetConfidenceLevel(0.95); // for CLS bool useCLs = true; calc.UseCLs(useCLs); calc.SetVerbose(false); // configure ToyMC Samler (needed only for frequentit calculator) ToyMCSampler *toymcs = (ToyMCSampler*)calc.GetHypoTestCalculator()->GetTestStatSampler(); // profile likelihood test statistics ProfileLikelihoodTestStat profll(*sbModel->GetPdf()); // for CLs (bounded intervals) use one-sided profile likelihood if (useCLs) profll.SetOneSided(true); // ratio of profile likelihood - need to pass snapshot for the alt // RatioOfProfiledLikelihoodsTestStat ropl(*sbModel->GetPdf(), *bModel->GetPdf(), bModel->GetSnapshot()); // set the test statistic to use toymcs->SetTestStatistic(&profll); // if the pdf is not extended (e.g. in the Poisson model) // we need to set the number of events if (!sbModel->GetPdf()->canBeExtended()) toymcs->SetNEventsPerToy(1); int npoints = 10; // number of points to scan // min and max (better to choose smaller intervals) double poimin = poi->getMin(); double poimax = poi->getMax(); //poimin = 0; poimax=10; std::cout << "Doing a fixed scan in interval : " << poimin << " , " << poimax << std::endl; calc.SetFixedScan(npoints,poimin,poimax); HypoTestInverterResult * r = calc.GetInterval(); double upperLimit = r->UpperLimit(); std::cout << "The computed upper limit is: " << upperLimit << std::endl; // compute expected limit std::cout << "Expected upper limits, using the B (alternate) model : " << std::endl; std::cout << " expected limit (median) " << r->GetExpectedUpperLimit(0) << std::endl; std::cout << " expected limit (-1 sig) " << r->GetExpectedUpperLimit(-1) << std::endl; std::cout << " expected limit (+1 sig) " << r->GetExpectedUpperLimit(1) << std::endl; // plot now the result of the scan HypoTestInverterPlot *plot = new HypoTestInverterPlot("HTI_Result_Plot","HypoTest Scan Result",r); // plot in a new canvas with style TCanvas * c1 = new TCanvas("HypoTestInverter Scan"); c1->SetLogy(false); plot->Draw("CLb 2CL"); // plot also CLb and CLs+b //plot->Draw("OBS"); // plot only observed p-value // plot also in a new canvas the test statistics distributions // plot test statistics distributions for the two hypothesis // when distribution is generated (case of FrequentistCalculators) const int n = r->ArraySize(); if (n> 0 && r->GetResult(0)->GetNullDistribution() ) { TCanvas * c2 = new TCanvas("Test Statistic Distributions","",2); if (n > 1) { int ny = TMath::CeilNint( sqrt(n) ); int nx = TMath::CeilNint(double(n)/ny); c2->Divide( nx,ny); } for (int i=0; i<n; i++) { if (n > 1) c2->cd(i+1); SamplingDistPlot * pl = plot->MakeTestStatPlot(i); pl->SetLogYaxis(true); pl->Draw(); } } } |
StandardHypoTestInvDemo.C
tutorial
This is a macro in $ROOTSYS/tutorials/roostats which can run the hypothesis test inversion with several possible options. We can find the tutorial code also
here.
Suppose we want to compute the 95% CLs
limit on the parametric unbinned model (Gaussian plus Exponential background), that we have used in the significance exercise:
root [0] .L $ROOTSYS/tutorials/roostats/StandardHypoTestInvDemo.C+ root [1] StandardHypoTestInvDemo("GausExpModel.root", "w", "ModelConfig", "", "data", 2, 3, true, 20, 0, 150 )The macro will print out the observed limit, expected limit and +/1,2 sigma bands/ It will also produce a p-value (
CLs
in this case) scan plot.
Next we can run the Frequentist calculation (it will take considerably more time, but we use only 500 toys):
root [1] StandardHypoTestInvDemo("GausExpModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 100, 500, false )Since we are using toys to estimate the test statistics distribution, the macro will produce the sampling distribution plots for each scanned point.
Note that, since we are running on the number counting model, we need to set to true the last option
root [1] StandardHypoTestInvDemo("CountingModel.root", "w", "ModelConfig", "", "data", 0, 3, true, 10, 0, 10, 500, true )For reference, here is the list of input arguments to the macro:
StandardHypoTestInvDemo(const char * infile = 0, const char * wsName = "combined", const char * modelSBName = "ModelConfig", const char * modelBName = "", const char * dataName = "obsData", int calculatorType = 0, int testStatType = 0, bool useCLs = true , int npoints = 6, double poimin = 0, double poimax = 5, int ntoys=1000, bool useNumberCounting = false, const char * nuisPriorName = 0){Where we have for the calculatorType:
I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() |
AsymptoticCls.png | manage | 19.6 K | 04-Jun-2013 - 23:16 | LorenzoMoneta | Asymptotic CLs result on Counting model |
![]() |
BayesInterval.png | manage | 10.2 K | 03-Jun-2013 - 23:05 | LorenzoMoneta | Bayesian Interval result |
![]() |
BayesUpperLimit.png | manage | 9.4 K | 04-Jun-2013 - 10:53 | LorenzoMoneta | Bayesian 95% Upper limit result |
![]() |
CombinedModel.C | manage | 3.0 K | 02-Jun-2013 - 12:35 | LorenzoMoneta | Combined Model ROOT macro |
![]() |
CombinedModelFit.png | manage | 22.6 K | 02-Jun-2013 - 12:35 | LorenzoMoneta | Combined model fit result |
![]() |
CountingModel.C | manage | 1.5 K | 04-Jun-2013 - 10:52 | LorenzoMoneta | ROOT macro for creating the counting model |
![]() |
FrequentistHTResult.png | manage | 11.3 K | 04-Jun-2013 - 15:00 | LorenzoMoneta | FrequentistCalculator result on a test of significance |
![]() |
GausExpModel.C | manage | 2.4 K | 05-Jun-2013 - 17:45 | LorenzoMoneta | Gaussian plus Exponential ROOT macro |
![]() |
GausExpModelFit.png | manage | 17.0 K | 01-Jun-2013 - 17:54 | LorenzoMoneta | Gaussian plus Exponential fit result |
![]() |
GaussianModel.C | manage | 1.2 K | 02-Jun-2013 - 23:20 | LorenzoMoneta | Gaussian Model ROOT Macro |
![]() |
GaussianModelFit.png | manage | 16.4 K | 01-Jun-2013 - 16:54 | LorenzoMoneta | Gaussian Fit Result |
![]() |
MCMCInterval.png | manage | 7.1 K | 03-Jun-2013 - 23:04 | LorenzoMoneta | MCMC interval result |
![]() |
PLInterval.png | manage | 8.2 K | 02-Jun-2013 - 23:08 | LorenzoMoneta | Profile Likelihood interval result |
![]() |
SimpleBayes.C | manage | 3.0 K | 03-Jun-2013 - 23:04 | LorenzoMoneta | Bayesian Calculator macro |
![]() |
SimpleHypoTest.C | manage | 2.5 K | 04-Jun-2013 - 15:01 | LorenzoMoneta | Hypothesis Test (Significance) ROOT macro |
![]() |
SimpleHypoTestInv.C | manage | 4.5 K | 04-Jun-2013 - 23:26 | LorenzoMoneta | Hypothesis Test Inversion macro |
![]() |
SimpleMCMC.C | manage | 2.0 K | 03-Jun-2013 - 23:03 | LorenzoMoneta | MCMC Calculator macro |
![]() |
SimplePL.C | manage | 1.6 K | 02-Jun-2013 - 23:07 | LorenzoMoneta | Profile Likelihood Interval macro |
![]() |
p0Plot.C | manage | 2.9 K | 04-Jun-2013 - 17:05 | LorenzoMoneta | ROOT macro for scanning p-values |
![]() |
p0Plot.png | manage | 10.1 K | 04-Jun-2013 - 17:05 | LorenzoMoneta | Result of p value scan |
![]() |
tutorial.css | manage | 0.2 K | 02-Jun-2013 - 11:58 | LorenzoMoneta | style for code background colour |