SCIENTIFIC DATA ANALYSIS LABORATORY (a. a. 2021-2022)

Laurea Magistrale / Year-II / Semester-I

Professor: Alexis Pompili

Program/Syllabus of the course

Suggested textbooks for the theoretical part of the course:
- Cowan (ediz. 1998)
- Lista (ediz. 2020)
- Metzger (ediz. 2010)

Copyright: all the material of this course could be used only under permission of the author
        (pompili AT ba.infn.it) and with proper acknowledgment.

In order to connect to the virtual machine hosted at ReCas and dedicated to the course ("pompilicorso"):
- from a Unix/Linux machine : ssh -Y [username]@90.147.75.45
- from a Windows (10) machine you need to freely download (from https://sourceforge.net) :
  Xming X Server for Windows and /Xming-fonts/7.7.0.10 to be able to use Emacs)

For any problem with the VM please contact vincenzo.spinoso AT ba.infn.it (and put A.P. in cc)

ZOOM Connection Coordinates (Meeting ID): 271 574 7853

Qualche esercitazione sara' concertata con il Dr. Adriano Di Florio (che sara' anche nella commissione d'esame).


PRACTICAL CLASS 0

Introduction to the Operting System UNIX/LINUX

- Commands' review
- Tutorial
- Recipes

Introduction to the editor EMACS

- Commands/I
- Commands/II

Introduction to the light editor VI

- Commands

Introduction to ROOT : Introduction to the use of ROOT and exercises to begin with(Practical Class 0)

Further material about the introduction to ROOT

Lezioni introduttive by Alfio Lazzaro: Lez.-1 , Lez.-2 , Lez.-3.
Further introductive material:
- Tutorial-1(by Manchester University)
- Tutorial-2(by Andrea Rizzi)
Online ROOT Manual


PRACTICAL CLASS 1

Histogramming within ROOT

[ Operations, Absolute and Relative Normalization, Stacked Plots, Data-Monte Carlo comparison ]

In this exercise you will learn, starting by a given rootuple of histograms, how to get the plots in Fig.4 (or Fig.6)
in the CMS paper JINST 7 P10002 (2012).

To understand the physics content (muon reconstruction and identification at CMS) please study the pagg. 6-14.

For the description of the code (ROOT macro) and the procedure: Esercitazione-1(pdf)

Additional code concerns how-to-do simulation-to-data ratio; proper rebinning can be suitable to make the ratio not prone to fluctuations in the distribution tails.

Exercise: try relative normalization (shape comparison) instead of absolute normalization as proposed in the main code exercise.


PRACTICAL CLASS 2

Exercise on histogram comparison (with ROOT): compatibility with other real data and with simulations

D^0 meson production cross section:
- CMS data compared with FONLL (https://arxiv.org/abs/2107.01476; Figure 5 / upper);
- CMS data compared with ALICE data (https://arxiv.org/abs/2102.13601) [ support description here ].

Learn the use of TGraphErrors and TGraphAsymErrors.


PRACTICAL CLASS 3

Exercise on hypothesis testing : observables to discriminate background from signal, ROC curves

In this exercise you will deal with a ROC curve application with the purpose to compare the rejection power
of two different algorithms. The physics case is taken by the study about the use of the impact parameter
of the leptons in the Higgs "golden" decay channel H→ZZ(*)→4leptons.

For the description of the code (ROOT macro) and the procedure: Esercitazione-2(pdf)


PRACTICAL CLASS 4

Introduction to RooFit

Introductive material: quick-manual(by W.Verkerke)
Lessons by W.Verkerke @ the BaBar Analysis School (2008): Lez-1 , Lez-2 , Lez-3.
Online : RooFit manual.

First Maximum Likelihood Fit with RooFit

In this exercise you will learn how to fit an invariant mass distribution (ψ'→μ+μ −) by using RooFit;
the PDF has both a signal and a background components.

For the description of the code (ROOT macro) and the procedure: Exercise-3(pdf).

For the theory behind fitting with MINUIT (the minimization engine of RooFit)
[Unbinned ML fit, Binned ML fit, Extended ML fit] have a look at the Addendum.
Here is additional follow-up material about MIGRAD, HESSE, MINOS functions in MINUIT.

Exercise: enable MINOS and check the difference with the symmetric(parabolic) error estimations.


PRACTICAL CLASS 4b

Refine the fit previously performed:

Firstly add the bin-by-bin pulls as a method of doing some goodness-of-fit.
Secondly let us use a (single-sided) Crystal-Ball function instead a Gaussian to describe the radiative tail.

For the description of the code (ROOT macro) and the procedure: Exercise-4b.
These slides introduce the single-sided CB implementation, the bin-by-bin pulls and their uncertainty.
This is the comparison between the old and the new fit (but also in the previous slides!)

Exercise: discuss why the projection of the bin-by-bin pulls should follow a standard gaussian distribution.
If not then the fit has something "pathological".


PRACTICAL CLASS 5

PRACTICAL CLASS 6

Here is proposed an additional exercise, the exam given in 2016/7.

Try two different background models (Chebyshev and Exponential);
try also to use a common Alpha and N for the tails of the two Crystal Ball functions.

Warning: find a turnaround for the maximum of 9 arguments in RooArgList


PRACTICAL CLASS 10

Here is proposed a similtaneous interpolation of two variables (mass and proper time, for B+ to J/psi K+ selected candidates) SLIDES


PRACTICAL CLASS 12

Here it is discussed the comparison between :
extended and not-extended fits,
the HESSE uncertainties and the MINOS ones,
the MINOS uncertainties and their connection with the Profile Likelihood (Ratio).


<