SCIENTIFIC DATA ANALYSIS LABORATORY (a. a. 2023-2024)

Laurea Magistrale / Year-II / Semester-I

Professor: Alexis Pompili

Program/Syllabus of the course

Suggested textbooks for the theoretical part of the course:
- Cowan (ediz. 1998)
- Lista (ediz. 2020)
- Metzger (ediz. 2010)

Copyright: all the material of this course could be used only under permission of the author
        (alexis.pompiliATba.infn.it) and with proper acknowledgment.

For the basic usage of the virtual machine hosted - at ReCas - dedicated to the course ("212.189.202.110"):
have a look at the instructions here.

For any relevant problem with the VM please contact edoardo.rennaATuniba.it & gioacchino.vinoATba.infn.it and put A.P. in cc


PRACTICAL CLASS 0a(UNIX)

Introduction to the Operating System UNIX/LINUX

- A few introductory slides (contains considerations about the usage of Python vs Bash shell scripting)
- Quick course
- Tutorial
- Commands' review
- Some manuals for a wide set of software can be found here (including Linux commands).

Introduction to the editor EMACS

- Commands/I
- Commands/II

Introduction to the light editor VI

- Commands


PRACTICAL CLASS 0b(ROOT)

Introduction to ROOT : Introduction to the use of ROOT and exercises to begin with (Practical Class 0)

Further material about the introduction to ROOT

More simplified infos about ROOT here.
Introductive lessons by A.Lazzaro: Lez.-1 , Lez.-2 , Lez.-3.
Further introductive material:
- Tutorial-1(by Manchester University)
- Tutorial-2(by A.Rizzi)


Online ROOT Web Page (the new frontier to read big data: RDataFrame)
Online ROOT DOCS references


PRACTICAL CLASS 1

Histogramming within ROOT

[ Operations, Absolute and Relative Normalization, Stacked Plots, Data-Monte Carlo comparison ]

In this exercise you will learn, starting by a given rootuple of histograms, how to get the plots in Fig.4 (or Fig.6)
in the CMS paper JINST 7 P10002 (2012).

To understand the physics content (muon reconstruction and identification at CMS) please study the pagg. 6-14.

For the description of the code (ROOT macro) and the procedure: Exercise-1

Additional code concerns how-to-do simulation-to-data ratio;
proper rebinning can be suitable to make the ratio not prone to fluctuations in the distribution tails (see exercise-with-solution)

Simple exercise for home: try relative normalization (shape comparison) instead of absolute normalization as proposed in the main code exercise.

Please checkout the following physics note.


PRACTICAL CLASS 2

Exercise on histogram comparison (with ROOT): compatibility with other real data and with simulations

D^0 meson production cross section:
- CMS data compared with FONLL (https://arxiv.org/abs/2107.01476 [JHEP 11 (2021) 225]; Figure 5 / upper);
- CMS data compared with ALICE data (https://arxiv.org/abs/2106.08278 [PRL 120 (2022) 012001]).

For both tasks: Exercise-2 .

Learn the use of TGraphErrors and TGraphAsymErrors.


PRACTICAL CLASS 3

Exercise on hypothesis testing : observables to discriminate background from signal, ROC curves

In this exercise you will deal with a ROC curve application with the purpose to compare the rejection power
of two different algorithms. The physics case is taken by the study about the use of the impact parameter
of the leptons in the Higgs "golden" decay channel H→ZZ(*)→4leptons.

For the description of the code (ROOT macro) and the procedure: Exercise-3 (to be replaced with an updated version)


PRACTICAL CLASS 4

Introduction to RooFit

We preliminary introduce the RooFit toolkit here.

Introductive material: quick-manual(by W.Verkerke)
Lessons by W.Verkerke @ the BaBar Analysis School (2008): Lez-1 , Lez-2 , Lez-3.
Online : RooFit manual.

Generation of a distribution according to a signal+background model and subsequent fit by means of an UML fit (with RooFit)
[the signal is a BW convoluted with a gaussian resolution function, whereas the background is an exponential behaviour].

For the description of the code (RooFit macro) and the procedure: Exercise-4


PRACTICAL CLASS 5

Extended Binned Maximum Likelihood Fit with RooFit

In this exercise you will learn how to fit an invariant mass distribution (ψ'→μ+μ −) by using RooFit;
the PDF has both a signal and a background components.

For the description of the code (ROOT macro), all the detials and the procedure: Exercise-5.

Note: for the theory behind fitting with MINUIT (the minimization engine of RooFit)
[Unbinned ML fit, Binned ML fit, Extended ML fit] have a look at the Addendum.


PRACTICAL CLASS 6

Do the fit and find an exotic state!

This exercise relies upon the things learned in the previous exercise. It has been given as an Exam in 2014/15.
Here is proposed as an exercise to start in the classroom (and finish at home if needed).

Here you are asked to fit the signal of ψ' decaying into μ+μ π +π −.
Check the mass resolution with respect to the previous signal of ψ' decaying into 2 muons
and appreciate how much it enhances with two more tracks (pions) constrained to come from the decay vertex.

Here you can find the outline of this Exercise-6.
Use all what learned before including pulls and interpolation models.


PRACTICAL CLASS 7

In this exercise we fit the signal of a φ→K+K − (diKaon invariant mass obtained by selecting B0s→J/ψ φ in a
in a part of CMS open data). In this case the experimental mass resolution (~ 1.3MeV from CMS Monte Carlo) is
smaller than the natural width of the φ (~ 4.3MeV).
Therefore the signal must be fitted with a Voigtian (it is a non-relativistic Breit-Wigner convoluted with
an experimental resolution gaussian).

For the description of the code (RooFit macro) and the procedure: Exercise-7.


PRACTICAL CLASS 8

In this exercise we experiment the usage of the Kolmogorov-Smirnov test, in particular comparing two distributions for 2 samples
with the aim to verify that they are coming from the same underlying population (as configured at generation).

For the description of the code (RooFit macro) and the procedure: Exercise-8.

The generation of samples is borrowed from Exercise-4.


PRACTICAL CLASS 9

Exercise-9 discusses the comparison between :
extended and not-extended fits,
the HESSE uncertainties and the MINOS ones,
the MINOS uncertainties and their connection with the Profile Likelihood (Ratio).
Essentially here we "re-discover" (from theory, as a check) that the MINOS error is equivalent to the 1σ uncertainty from the Profile Likelihood Ratio.

Some RooFit documentation for the Profile Likelihood Ratio: supplementary material (mostly used in my slides).


PRACTICAL CLASS 10

Here is proposed a similtaneous interpolation of two variables, mass and proper time, for B+ to J/psi K+ selected candidates (2D-fit) Exercise-10


PRACTICAL CLASS 11

By injecting a signal on a background distribution we learn how to evaluate the local statistical significance of the signal
in 3 different equivalent ways : Exercise11-part1.
The slides contain a detailed discussion of the theory following the paper by Cowan, Kranmer, Gross and Vitells Eur.Phys.J. C 71 (2011) 1554.

On the path of the previous exercise we inject a stronger signal and we discuss the practical definition of 3 Figures Of Merit
used typically in the selections (signal significance, signal purity and signal-to-noise ratio: Exercise11-part2.


PRACTICAL CLASS 12

Here we investigate how to extract a background-subtracted distribution of an observables. In a first example we use the sidebands of the signal peak.
In a second one we learn how to use the bin-wise method together with another observable (slides in preparation).


PRACTICAL CLASS 13

By exploiting Jupyter Hub installed on the Virtual Machine of the curse we learn how to build up the extraction of the B0s to J/psi Phi signal from part of CMS open data by using a python-based notebook (documentation is provided as jupyter notebook).


PRACTICAL CLASS 14

Homework for the practical exam preparation:

Here is proposed as an additional fitting exercise, the exam given in 2016/7.

Try two different background models (Chebyshev and Exponential);
try also to use a common Alpha and N for the tails of the two Crystal Ball functions.

Warning: find a turnaround for the maximum of 9 arguments in RooArgList


Copyright: all the material of this course could be used only under permission of the author
        (pompili AT ba.infn.it) and with proper acknowledgment.