Exercise 0: Test your environment

In a console, change to the directory “<theta-svn>/utils2/examples/infn2013” where “<theta-svn>” is the path where you checked out theta from subversion. TODO: where is this on VM?

Execute ex0.py by typing::
../../theta-auto.py ex0.py

in the console.

In case your setup is working, you should see an output TODO

Also the other scripts in this exercise will be called this way, using::
../../theta-auto.py <script-name>

Usually, most of the code is already included and only minor modifications are required. In case you are stuck, have a look at the corresponding “solutions” python file which contains the non-trivial implementation parts.

Some ready-made routines are in “common.py”; the methods are documented with comments.

Exercise 1: Hypothesis tests for a counting experiment

This exercise will repeat common concepts of hypothesis test (p-value) from the lecture for a typical “prototype” experiment in HEP: assume that after some event selection, you expect a mean background of b, so the number of data events n is assumed to follow a Poisson distribution with mean b in absence of signal (this is the null hypothesis of the hypothesis test). For a non-zero signal s, the data will follow a Poisson distribution around s+b.

Reminder: The p-value is the probability to observe “at least as extreme” as data, where a test statistic is used to specify what “extreme” actually means. For the present case, the number of observed events is a good test statistic, so the p-value is the probability to observe at least as many events for background-only.

1.a. Analytical p-values: For different values of the background b, determine the p-value for

an observation around (or above) b by numerical evaluation of the analytic formula. For b=5.2 and nobs between 5 and 20, this is implemented in the method “ex1a” in the file “ex1.py” (comment out the call to ex1a in the code).

Question: What is the p-value and Z-value for * b=10000 and nobs=10000 * b=10000 and nobs=10100 * b=10000 and nobs=10200 * b=10000 and nobs=9900 Compare the result to the normal approximation Z_a = s/sqrt(b) = (n - b)/sqrt(b). (The code for calculating and printing Z_a is currently commented out).

1.b. Toy MC method for p-value: As in part a., compute p-values, but this time use the toy Monte-Carlo

method. First, have a look at the generated Poisson data by commenting in the code blocks.

Question: What is the p-value and Z-value for b=100 and nobs=120? Compare the result to the normal approximation
s / sqrt(b).
Extra question: Try out different values for b and nobs, paying particular attention to the quality of
the normal approximation.
[Note that you need to set the number of toys high enough to reach a reasonable accuracy for the p-value.
To get the error on p, either re-execute the script to see the change in p-value or implement the error estimate delta_p = sqrt(p*(1-p)/ntoy) ]
1.c. Toy MC method with systematic uncertainty: Compute the p-value with the MC method

for a statistical model with an uncertainty on the background with the prior-average method.

As in 1.b., print (or plot) some generated toy data values. Also try a large relative uncertainty on the Poisson mean (e.g. 100%). You will see the code fail. Why? Fix the code as you think is appropriate.

Question: What is the p-value and Z-value for b=10000, delta_b=50 and nobs = 10200? Compare
the results to the normal approximation Z_a = s / sqrt(b + delta_b^2).
Extra question: Try out different values for b, nobs and delta_b, paying particular attention to the quality of
the normal approximation.
1.d. Toy MC method with theta: We now use the “theta” program for generating the likelihood-ratio test statistic

distribution for the shape model introduced in the lecture. Follow the instructions in the code to generate the likelihood ratio test statistic for a background-only toy ensemble and for data in order to calculate the p-value and Z-value for the signal mass of 500; if you remember the lecture, this was a 3.1sigma effect.

The toy data with the 3.1sigma effect is generated with the value for the random seed of 50 (at least in my version of python ...). Try out other seeds as an argument for “build_shape_model” (defined and documented in common.py). Also manipulate other arguments to “build_shape_model” to include a background rate and shape uncertainty and re-generate test statistic distributions for those cases.

Question: How does the p-value change if including both, a 10% rate uncertainty on the background and the shape uncertainty?

Exercise 2: Neyman limits and intervals

Table Of Contents

This Page