Instrumentation Physics: Applications of Machine Learning#

Physics 503   Fall 2024

  • Instructor: Professor Mark Neubauer

  • Class Meetings:

    • Mondays and Wednesdays from 12:30 pm to 1:45 pm

    • Room: 262 Loomis Laboratory

  • 4 credit hours

Calendar#

Note: This schedule will evolve throughout the semseter

Week

Topic

Homework

Projects

Aug 26

Course Introduction

HW 01

Sep 02

Visualizing & Finding Structure in Data

HW 02

Sep 09

Dimensionality, Linearity and Kernel Functions

HW 03

Sep 16

Probability Theory

HW 04

Sep 23

Kernel Density Estimation and Statistics

HW 05

Sep 30

Bayesian Statistics and Markov Chain Monte Carlo

HW 06

Oct 07

Stochastic Processes, Markov Chains & Variational Inference

HW 07

Project 01

Oct 14

Optimization and Model Selection

Oct 21

Learning & Cross Validation

Oct 28

Supervised Learning & Artificial Neural Networks

Nov 04

Deep Neural Networks

Nov 11

Graph Neural Networks

Nov 18

Unsupervised Learning, Uncertainties and Anomaly Detection

Nov 25

FALL BREAK - NO CLASSES

Dec 02

Explainable AI and Accelerated Machine Learning

Dec 09

TBD

Overview#

Welcome! Data is everywhere. Efficient data analysis leading to solid conclusions requires performant tools and rigorous mathematical techniques tethered by sound scientific methods.

This course is designed to give students a solid foundation in machine learning applications to physics, positioning itself at the intersection of machine learning and data-intensive science. This course will introduce students to the fundamentals of analysis and interpretation of scientific data, and applications of machine learning to problems common in laboratory science such as classification and regression. There will be two 75-minute classes each week, split into discussions of core principles and hands-on exercises involving coding and data. There will be a few projects throughout semester that will build on the course material and utilize open source software and open data in physics and related fields. The list of topics will evolve, according to the interests of the class and instructors. Material will be clustered into units of varying duration, as indicated in the table of contents. The lists of suggested readings and references are advisory; a large amount of material of excellent quality is now available on the worldwide web, particularly on the sites of university courses addressing the topics of each unit.

A distinguishing feature of this course is its sharp focus on endeavors in the data-rich physical sciences as the arenas in which modern machine learning techniques are taught. The course uses open scientific data, open source software from data science and physics-related fields, and publically-available information as enabling elements. Research-inspired projects are an important part of the course and students will not only execute them but will play an active role in helping define and shape them. Example projects might include machine learning approaches to searches for new particles or interactions at high-energy colliders; methods of particle tracking and reconstruction; identification, classification and measurement of astrophysical phenomena; novel approaches to medical imaging and simulation using techniques from physics and machine learning; machine learning in quantum information science. Through these projects and the course material, students will learn how large datasets in physics are generated, curated, and analyzed, using machine learning as a tool to generate key insights in both experimental and theoretical science.

Course Logistics#

Format#

  • This course will consist of two meetings per week: one lecture period and one in-class practical session.

  • Lecture: Monday from 12:30 pm - 1:45 pm in 262 Loomis

  • Practical Session: Wendesday from 12:30 pm - 1:45 pm in 262 Loomis

Instructor#

TAs#

Online Tools#

There are several online tools you will need to use as part of this course.

Campuswire#

We will use Campuswire as a class forum, a way to message the course staff and each other, and a means to submit your attendance question.

Google Colab#

Using Google Colab, you will be able to program your code in a Jupyter notebook and submit it for us to grade. Please sign in to your Illinois account. While working on the assignment, you will share each of your colab assignments with the professor and the TA (but no one else).

Gradescope#

On Gradescope, you will submit your assignments and find your graded assignments.

Coursework#

Homework Assignments#

You will be assigned weekly homework assignments that will put into practice what you learned in lecture for the week.

  • You will work on the assignments both during the in-class session on Wednesdays and as homework.

  • You will submit your executed (i.e. with “RunAll”) homework notebook via Gradescope.

  • Each assignment is due at the beginning of the next class unless otherwise noted. You may turn assignment in up to one week late for 50% credit (except that all assignments are strictly due the day before Reading Day).

  • Solutions to the homeworks will not be given.

  • You may collaborate on assignments but must submit your own work.

  • Graded homework will be available through Gradescope.

Projects#

At appropriate times throughout the course, you will select from a list of projects that involve demonstrating and extending your work in class by doing something cool and interesting in data analysys. You must work alone on this (i.e. without collaboration).

For projects you will put together a Jupyter notebook that demonstrates your project. The notebook should have code and demonstrate the task but also be written in an expository way that other students could, in principle, read and learn from. It is submitted in an analogous way as the regular course assignments.

Each project notebook must be submitted via Gradescope for grading.

Grading#

  • Class attendence and participation: 5%

  • Homework: 65%

  • Projects: 30%

Letter grades will be assigned as follows:

  • A+   [97.0 - 100.0]

  • A     [93.0 - 96.9]

  • A-   [90.0 - 92.9]

  • B+   [87.0 - 89.9]

  • B     [83.0 - 86.9]

  • B-   [80.0 - 82.9]

  • C+   [77.0 - 79.9]

  • C     [73.0 - 76.9]

  • C-   [70.0 - 72.9]

  • D+   [67.0 - 69.9]

  • D     [63.0 - 66.9]

  • D-   [60.0 - 62.9]

  • F     [00.0 - 59.9]

Datasets#

In this section we describe the datasets used in the lectures and homeworks. There are additional scientific datasets used for the projects that as described in the projects area of the course page.

Line#

A simple line with errors. Columns are x, y and dy. The reported errors are systematically too large by a constant factor, and are set to NaN for a fraction of the samples. Target is y_true.

Applications:

  • Reading CSV into a Pandas dataframe.

  • Straight line regression.

  • Handling missing values.

  • Handling (overestimated) input errors.

Pong#

Each sample is a 2D trajectory of a ping-pong ball launched with different initial conditions. Trajectories are calculated with an analytic model that includes a linear drag term. There are three clusters of trajectories with similar initial conditions, identified by target ‘grp’. Target ‘th0’ gives the true initial launch angle in degrees. Target hit target identifies trajectories that pass through a fixed “hoop” at x=0.5.

Applications:

  • Reading HF5 into a Pandas dataframe.

  • Dimensionality reduction (20D points lie on a 2D manifold).

  • Nonlinear regression (target ‘th0’).

  • Clustering (target ‘grp’).

  • Classification (target ‘hit’).

Cosmo#

Each sample is LCDM cosmology defined by input parameters ‘omega_b’, ‘omega_cdm’, ‘ln10^{10}A_s’ and ‘H0’. Corresponding targets are values of ‘sigma8’, ‘rd’, ‘DA(0.57)/rd’, ‘DH(0.57)/rd’, ‘DA(2.34)/rd’, and ‘DH(2.34)/rd’ calculated with CLASS. The CLASS calculations are relatively slow (~1 hr per 1K), so the goal of this dataset is to train a faster emulator. Input values are uniformly distributed on a grid centered on the Planck2015 best fit result and spanning +/-10 sigmas.

Applications:

  • Dimensionality reduction.

  • Approximately linear regression.

Higgs#

Data from the 2014 Higgs Challenge which is now archived here.

This file is too large to include in the repo, so instead the Pandas notebook provides a function to generate higgs_data.hf5 and higgs_target.hf5 from the downloaded .csv.gz file and copy them into the installed data path.

Applications:

  • Dimensionality reduction.

  • Train/test/split.

  • Classification.

Clusters#

Demo files for clustering: 4 in 2D with 2 clusters, and 1 in 3D with 3 clusters. Data features are ‘x0’, ‘x1’ (‘x2’) and target is ‘y’.

Applications:

  • Clustering.

Spectra#

Spectra containing two peaks with variable flux and fixed locations and widths, over a constant background, with Poisson noise added. Data features are fluxes in wavelength bins (with un-named columns). Targets are the true fluxes in each peak (‘flux1’, ‘flux2’).

Applications:

  • Dimensionality reduction.

  • Clustering.

  • Regression.

Circles#

The circles files contain 500 2D points on two concentric circles with feature names ‘x0’, ‘x1’ and target integer ‘y’ = 0,1 indicating which circle they belong to.

Applications:

  • Linear clustering in higher dimensions.

  • Kernel trick.

  • Kernel PCA.

Ess#

The ess files contain 500 3D points on a 2D sheet bent into an S-shape with features named ‘x0’, ‘x1’, ‘x2’ and target value ‘y’ from 0-1 giving the coordinate along the sheet.

Applications:

  • Manifold learning.

  • Locally linear embedding (LLE).

Blobs#

The blobs files contain 2K 3D points sampled from 3 Gaussian blobs with features named ‘x0’, ‘x1’, ‘x2’ and target value ‘y’ = 0, 1, 2 giving their generated group membership.

Applications:

  • Clustering.

  • Density estimation.

Policies#

Covid#

  • Policies as it relates to COVID-19 can be found at https://covid19.illinois.edu

  • If you feel ill or are unable to come to class or complete class assignments due to issues related to COVID-19, including but not limited to testing positive yourself, feeling ill, caring for a family member with COVID-19, or having unexpected child-care obligations, you should contact your instructor immediately, and you are encouraged to copy your academic advisor.

About using code you find on the web or generative AI for homework and projects#

The quickest way to deal with the arcana of programing is to ask Google or ChatGPT for examples of what you are seeking to accomplish. But you will need to use your own judgment in terms of value added for your learning in using these techologies Your generation will need to how learn to work productively in-concert with AI. That - that’s a technological genie out of the bottle. Finding its way back into the bottle is as a likely as a broken glass spontaeously reassembling. As with any external resource, you must always credit the original source of code and other information that you paste into your own programs, notebooks, projects, etc in a comment that includes the original source. If an author says that his/her code is not to be copied or incorporated into your programs, then DON’T.

Students must cite all references, including any code they have used that they did not write themselves. Failure to cite references will be considered an academic integrity violation and be pursued according to University policy, which may include receiving a failing grade on an assignment or in the entire course. Citations do not need to follow any specific format (such as ACM style, etc.) but should mention the author’s name and where the cited work can be found (including a URL, if applicable). In code, a citation can be left in a comment.

Academic Integrity#

You must never submit the work of someone else as your own. We understand that many of you will find it helpful to work with other students to master the course. But when you collaborate with your study group on homework assignments, you must be a full, active participant in developing the solutions that you submit for credit.

It is cheating to receive answers from another student and then use them as your own. It is cheating to submit as your own work solutions that you find by searching on the worldwide web (though see “About using code you find on the web”) or using online tools such as ChatGPT, or by subscribing to an online service that suborns cheating. It is cheating—and a violation of U.S. copyright law—to give (or sell) course material to someone else who intends to redistribute and/or sell it.

All activities in this course, are subject to the Academic Integrity rules as described in Article 1, Part 4, Academic Integrity, of the Student Code.

Sexual Misconduct Reporting Obligation#

The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX Office. In turn, an individual with the Title IX Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.

A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found here: wecare.illinois.edu/resources/students/#confidential.

Other information about resources and reporting is available here: https://wecare.illinois.edu and https://wellness.illinois.edu.

Mental Health Services#

Significant stress, mood changes, excessive worry, substance/alcohol misuse or interferences in eating or sleep can have an impact on academic performance, social development, and emotional wellbeing. The University of Illinois offers a variety of confidential services including individual and group counseling, crisis intervention, psychiatric services, and specialized screenings which are covered through the Student Health Fee. If you or someone you know experiences any of the above mental health concerns, it is strongly encouraged to contact or visit any of the University’s resources provided below. Getting help is a smart and courageous thing to do for yourself and for those who care about you.

  • Counseling Center (217) 333-3704

  • McKinley Health Center (217) 333-2700

  • National Suicide Prevention Lifeline (800) 273-8255

  • Rosecrance Crisis Line (217) 359-4141 (available 24/7, 365 days a year)

If you are in immediate danger, call 911 *This statement is approved by the University of Illinois Counseling Center.

Students with Disabilities#

To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to https://www.disability.illinois.edu. If you are concerned you have a disability-related condition that is impacting your academic progress, there are academic screening appointments available that can help diagnosis a previously undiagnosed disability. You may access these by visiting the DRES website and selecting “Request an Academic Screening” at the bottom of the page.

Resources#

Useful references#

Quick guides#

Tools#

Git and GitHub#

Project Jupyter#

Acknowledgements#

This course was developed by Mark Neubauer during the Fall 2023 semester. I would like to acknowledge David Kirby at the University of California at Irvine for the materials and setup for which this course is based and the helpful discussions we have had.