SCIENTIFIC DATA ANALYSIS LABORATORY (a. a. 2022-2023)

Laurea Magistrale / Year-II / Semester-I

Professor: Alexis Pompili

Program/Syllabus of the course

Suggested textbooks for the theoretical part of the course:
- Cowan (ediz. 1998)
- Lista (ediz. 2020)
- Metzger (ediz. 2010)

Copyright: all the material of this course could be used only under permission of the author
        (pompili AT ba.infn.it) and with proper acknowledgment.

In order to connect to the virtual machine hosted at ReCas and dedicated to the course ("pompilicorso"):
- from a Unix/Linux machine : ssh -Y [username]@90.147.75.45
- from a Windows (10) machine you need to freely download (from https://sourceforge.net) :
  Xming X Server for Windows and /Xming-fonts/7.7.0.10 (needed to be able to use Emacs)


For details have a look here.

For any problem with the VM please contact vincenzo.spinoso AT ba.infn.it (and put A.P. in cc)


PRACTICAL CLASS 0

Introduction to the Operating System UNIX/LINUX

- Couple of introductory slides (contains considerations about the usage of Python vs Bash shell scripting)
- Quick course
- Tutorial
- Commands' review
- Some manuals for a wide set of software can be found here (including Linux commands).

Introduction to the editor EMACS

- Commands/I
- Commands/II

Introduction to the light editor VI

- Commands

Introduction to ROOT : Introduction to the use of ROOT and exercises to begin with (Practical Class 0)

Further material about the introduction to ROOT

More simplified infos about ROOT here.
Introductive lessons by A.Lazzaro: Lez.-1 , Lez.-2 , Lez.-3.
Further introductive material:
- Tutorial-1(by Manchester University)
- Tutorial-2(by A.Rizzi)


Online ROOT Web Page (the new frontier to read big data: RDataFrame)
Online ROOT DOCS references


PRACTICAL CLASS 1

Histogramming within ROOT

[ Operations, Absolute and Relative Normalization, Stacked Plots, Data-Monte Carlo comparison ]

In this exercise you will learn, starting by a given rootuple of histograms, how to get the plots in Fig.4 (or Fig.6)
in the CMS paper JINST 7 P10002 (2012).

To understand the physics content (muon reconstruction and identification at CMS) please study the pagg. 6-14.

For the description of the code (ROOT macro) and the procedure: Esercitazione-1

Additional code concerns how-to-do simulation-to-data ratio;
proper rebinning can be suitable to make the ratio not prone to fluctuations in the distribution tails (see exercise-with-solution)

Simple exercise for home: try relative normalization (shape comparison) instead of absolute normalization as proposed in the main code exercise.

Please checkout the following physics note.


PRACTICAL CLASS 2

Exercise on histogram comparison (with ROOT): compatibility with other real data and with simulations

D^0 meson production cross section:
- CMS data compared with FONLL (https://arxiv.org/abs/2107.01476 [JHEP 11 (2021) 225]; Figure 5 / upper);
- CMS data compared with ALICE data (https://arxiv.org/abs/2106.08278 [PRL 120 (2022) 012001]).

For both tasks: Esercitazione-2 .

Learn the use of TGraphErrors and TGraphAsymErrors.


PRACTICAL CLASS 3

Exercise on hypothesis testing : observables to discriminate background from signal, ROC curves

In this exercise you will deal with a ROC curve application with the purpose to compare the rejection power
of two different algorithms. The physics case is taken by the study about the use of the impact parameter
of the leptons in the Higgs "golden" decay channel H→ZZ(*)→4leptons.

For the description of the code (ROOT macro) and the procedure: Esercitazione-3


PRACTICAL CLASS 4

Introduction to RooFit

Introductive material: quick-manual(by W.Verkerke)
Lessons by W.Verkerke @ the BaBar Analysis School (2008): Lez-1 , Lez-2 , Lez-3.
Online : RooFit manual.

First Maximum Likelihood Fit with RooFit

In this exercise you will learn how to fit an invariant mass distribution (ψ'→μ+μ −) by using RooFit;
the PDF has both a signal and a background components.

For the description of the code (ROOT macro) and the procedure: Exercise-4.

For the theory behind fitting with MINUIT (the minimization engine of RooFit)
[Unbinned ML fit, Binned ML fit, Extended ML fit] have a look at the Addendum.
Here is additional follow-up material about MIGRAD, HESSE, MINOS functions in MINUIT.

Exercise: enable MINOS and check the difference with the symmetric(parabolic) error estimations.


PRACTICAL CLASS 4b

Refine the fit previously performed:

Firstly add the bin-by-bin pulls as a method of doing some goodness-of-fit.
Secondly let us use a (single-sided) Crystal-Ball function instead a Gaussian to describe the radiative tail.

For the description of the code (ROOT macro) and the procedure: Exercise-4b.
These slides introduce the single-sided CB implementation, the bin-by-bin pulls and their uncertainty.

Exercise-4c: discuss why the projection of the bin-by-bin pulls should follow a standard gaussian distribution.
If not then the fit has something "pathological".


PRACTICAL CLASS 5

This exercise relies upon the fits learned in Exercise 4 but here you have to automatize the fits in all the rapidity bins
in order to try get, by means of a final fit, a functional expression that represents the variation of the mass resolution with the rapidity.

Here you can find all the info needed to carry out the Exercise-5.


PRACTICAL CLASS 6

Do the fit and find an exotic state!

This exercise relies upon the things learned in Exercise 4. It has been given as an Exam in 2014/15.
Here is proposed as an exercise to start in the classroom (and finish at home if needed).

Here you are asked to fit the signal of ψ' decaying into μ+μ π +π −.
Check the mass resolution with respect to the previous signal of ψ' decaying into 2 muons
and appreciate how much it enhances with two more tracks (pions) constrained to come from the decay vertex.

Here you can find the outline of this Exercise-6.
Use all what learned before including pulls and interpolation models.


PRACTICAL CLASS 7

In this exercise we fit the signal of a φ→K+K − (diKaon invariant mass obtained by selecting B0s→J/ψ φ in a
in a part of CMS open data). In this case the experimental mass resolution (~ 1.3MeV from CMS Monte Carlo) is
smaller than the natural width of the φ (~ 4.3MeV).
Therefore the signal must be fitted with a Voigtian (it is a non-relativistic Breit-Wigner convoluted with
an experimental resolution gaussian).

For the description of the code (RooFit macro) and the procedure: Exercise-7.


PRACTICAL CLASS 8

Exercise-8 discusses the comparison between :
extended and not-extended fits,
the HESSE uncertainties and the MINOS ones,
the MINOS uncertainties and their connection with the Profile Likelihood (Ratio).
Essentially here we "re-discover" (check) that the MINOS error is equivalent to the 1σ uncertainty from the Profile Likelihood Ratio.

Some RooFit documentation for the Profile Likelihood Ratio: supplementary material (mostly used in my slides).


PRACTICAL CLASS 9

Here is proposed as an additional fitting exercise, the exam given in 2016/7.

Try two different background models (Chebyshev and Exponential);
try also to use a common Alpha and N for the tails of the two Crystal Ball functions.

Warning: find a turnaround for the maximum of 9 arguments in RooArgList


PRACTICAL CLASS 10

Generation of a distribution according to a signal+background model and subsequent fit by means of an UML fit
(with RooFit) [the signal is a Voigtian (BW convoluted with a gaussian resolution function)
while the background is given an exponential behaviour].

For the description of the code (RooFit macro) and the procedure: Exercise-10.

To be able to set the seed of the random generator please see this Additional Note.

Here you can find a follow-up: supplementary material.


PRACTICAL CLASS 11

By injecting a signal on a background distribution we learn how to evaluate the local statistical significance of the signal
in 3 different equivalent ways : Exercise11-part1.
The slides contain a detailed discussion of the theory following the paper by Cowan, Kranmer, Gross and Vitells Eur.Phys.J. C 71 (2011) 1554.

On the path of the previous exercise we inject a stronger signal and we discuss the practical definition of 3 Figures Of Merit
used typically in the selections (signal significance, signal purity and signal-to-noise ratio: Exercise11-part2.


PRACTICAL CLASS 12

Here is proposed a similtaneous interpolation of two variables (mass and proper time, for B+ to J/psi K+ selected candidates) Exercise-12


PRACTICAL CLASS 13

By exploiting Jupyter Hub installed on the VM we learn how to build up the extraction of the B0s to J/psi Phi signal from part of CMS open data by using a python-based notebook.


Copyright: all the material of this course could be used only under permission of the author
        (pompili AT ba.infn.it) and with proper acknowledgment.