Introduction and Setup
Framework setup
git clone --recurse-submodules https://github.com/green-cabbage/copperheadV2.git
cd copperheadV2
git checkout main
# If already cloned the repo, then to update the submodules run:
git submodule update --remote --merge
Everytime you open a new terminal session, run the following command to setup the environment variables:
source setup_env.sh
# default conda environment is `coffea_latest`.
source setup_env.sh yun
# yun is for `yun_coffea_latest`
To create the dask client, open the jupyter notebook DaskGatewaySLURM.ipynb and run cells upto section "Create the gateway" to create the dask client.
NOTE: Make sure when you don't need the dask client, you close it from the notebook to free up the resources. For this run the cells under section "Delete the gateway".
Running the code
Obtain the reduced ntuples
- Pre-Processing: Just prepares the JSON file having all the root files belongs to a particular sample (DAS name), along with its metadata like total number of events, etc.
- Stage-1: This step applies the pre-selection, corrections, scale-factors, etc.
Pre-stage reads the dataset information from the YAML file and saves the root files to read in next step with its metadata in a JSON file.
bash stage1_loop_Improved.sh -v 12 -c configs/datasets/dataset_nanoAODv12.yaml -l label_for_ntuple -y 2018 -m 0
where
- -v: nanoAOD version
- -c: path to the dataset YAML file that contains the list of samples to be processed
- -l: label for the output ntuple files
- -y: year of data-taking
- -m: run mode. 0 for pre-processing, 1 for stage-1 processing.
Run the stage1 to skim the data. It also saves the weight for Z-pT reweighting, and and all other necessary weights for the analysis.
bash stage1_loop_Improved.sh -v 12 -c configs/datasets/dataset_nanoAODv12.yaml -l label_for_ntuple -y 2018 -m 1
Get the yields
-
To get the yields from the reduced ntuples, use the script get_yields.py.
bash python scripts/get_yields.py --years 2018Note that this file reads ntuples from the YAML file configs/trials.yml. Update the path under the keycurrentto point the label with with the reduced ntuples are created.
Sync/Compare with previous results
-
To compare the yields with previous results, use the script sync_parquet_dimuon.py.
bash python scripts/sync_parquet_dimuon.py DIR1 DIR2 -o diff.csv
Control Plots
Before running the below code make sure to update the input and output paths and several other parameters in the run_plotter.py file.
The main code for plotting is in the plotter/validation_plotter_unified.py file. In this file you may need to update the list of datasets to be considered for different processes. You can see them here validation_plotter_unified.py#L28-L83
python run_plotter.py