Instrumentation Physics: Applications of Machine Learning#
Physics 503 Fall 2025
Instructors:
Professor Mark Neubauer
Dr. Aaron Pearlman
Class Meetings:
Mondays and Wednesdays from 12:30 pm to 1:45 pm
Room: 262 Loomis Laboratory
4 credit hours
Calendar#
Note: This schedule will evolve throughout the semseter
Week |
Topic |
Homework |
Projects |
---|---|---|---|
Aug 25 |
|||
Sep 01 |
Visualizing & Finding Structure in Data Sep 3 only, Sep 1 is Labor Day |
||
Sep 08 |
|||
Sep 15 |
|||
Sep 22 |
|||
Sep 29 |
|||
Oct 06 |
|||
Oct 13 |
NO HW |
||
Oct 20 |
|||
Oct 27 |
|||
Nov 03 |
|||
Nov 09 |
NO HW |
||
Nov 17 |
|||
Nov 24 |
FALL BREAK - NO CLASSES |
||
Dec 01 |
NO HW |
||
Dec 08 |
NO HW |
Overview#
Welcome! Data is everywhere. Efficient data analysis leading to solid conclusions requires performant tools and rigorous mathematical techniques tethered by sound scientific methods.
This course is designed to give students a solid foundation in machine learning applications to physics, positioning itself at the intersection of machine learning and data-intensive science. This course will introduce students to the fundamentals of analysis and interpretation of scientific data, and applications of machine learning to problems common in laboratory science such as classification and regression. There will be two 75-minute classes each week, split into discussions of core principles and hands-on exercises involving coding and data. There will be a few projects throughout semester that will build on the course material and utilize open source software and open data in physics and related fields. The list of topics will evolve, according to the interests of the class and instructors. Material will be clustered into units of varying duration, as indicated in the table of contents. The lists of suggested readings and references are advisory; a large amount of material of excellent quality is now available on the worldwide web, particularly on the sites of university courses addressing the topics of each unit.
A distinguishing feature of this course is its sharp focus on endeavors in the data-rich physical sciences as the arenas in which modern machine learning techniques are taught. The course uses open scientific data, open source software from data science and physics-related fields, and publically-available information as enabling elements. Research-inspired projects are an important part of the course and students will not only execute them but will play an active role in helping define and shape them. Example projects might include machine learning approaches to searches for new particles or interactions at high-energy colliders; methods of particle tracking and reconstruction; identification, classification and measurement of astrophysical phenomena; novel approaches to medical imaging and simulation using techniques from physics and machine learning; machine learning in quantum information science. Through these projects and the course material, students will learn how large datasets in physics are generated, curated, and analyzed, using machine learning as a tool to generate key insights in both experimental and theoretical science.
Course Logistics#
Format#
This course will consist of two meetings per week: one lecture period and one in-class practical session.
Lecture: Monday from 12:30 pm - 1:45 pm in 262 Loomis
Practical Session: Wendesday from 12:30 pm - 1:45 pm in 262 Loomis
Instructors#
Professor Mark Neubauer
email: msn@illinois.edu
Office Hour: Wednesday 5:30 pm - 6:30 pm
Dr. Aaron Pearlman
email: aaronjp@illinois.edu
Office Hour: Wednesday 3pm - 4pm
290B Loomis Laboratory
TAs#
Pin-Yi Li
email: pinyili2@illinois.edu
Office Hour: Tuesday 3pm - 4pm
3039 Beckman Institute
Online Tools#
There are several online tools you will need to use as part of this course.
Campuswire#
We will use Campuswire as a class forum, a way to message the course staff and each other, and a means to submit your attendance question.
Google Colab#
Using Google Colab, you will be able to program your code in a Jupyter notebook and submit it for us to grade. Please sign in to your Illinois account. While working on the assignment, you will share each of your colab assignments with the professor and the TA (but no one else).
Gradescope#
On Gradescope, you will submit your assignments and find your graded assignments.
Coursework#
Homework Assignments#
You will be assigned weekly homework assignments that will put into practice what you learned in lecture for the week.
You will work on the assignments both during the in-class session on Wednesdays and as homework.
You will submit your executed (i.e. with “RunAll”) homework notebook via Gradescope.
Each assignment is due at the beginning of the next class unless otherwise noted. You may turn assignment in up to one week late for 50% credit (except that all assignments are strictly due the day before Reading Day).
Solutions to the homeworks will not be given.
You may collaborate on assignments but must submit your own work.
Graded homework will be available through Gradescope.
Projects#
At appropriate times throughout the course, you will select from a list of projects that involve demonstrating and extending your work in class by doing something cool and interesting in data analysys. You must work alone on this (i.e. without collaboration).
For projects you will put together a Jupyter notebook that demonstrates your project. The notebook should have code and demonstrate the task but also be written in an expository way that other students could, in principle, read and learn from. It is submitted in an analogous way as the regular course assignments.
Each project notebook must be submitted via Gradescope for grading.
Grading#
Class attendence and participation: 5%
Homework: 65%
Projects: 30%
Letter grades will be assigned as follows:
A+ [97.0 - 100.0]
A [93.0 - 96.9]
A- [90.0 - 92.9]
B+ [87.0 - 89.9]
B [83.0 - 86.9]
B- [80.0 - 82.9]
C+ [77.0 - 79.9]
C [73.0 - 76.9]
C- [70.0 - 72.9]
D+ [67.0 - 69.9]
D [63.0 - 66.9]
D- [60.0 - 62.9]
F [00.0 - 59.9]
Datasets#
In this section we describe the datasets used in the lectures and homeworks. There are additional scientific datasets used for the projects that as described in the projects area of the course page.
Line#
A simple line with errors. Columns are x
, y
and dy
. The reported errors are systematically too large by a constant factor, and are set to NaN for a fraction of the samples. Target is y_true
.
Applications:
Reading CSV into a Pandas dataframe.
Straight line regression.
Handling missing values.
Handling (overestimated) input errors.
Pong#
Each sample is a 2D trajectory of a ping-pong ball launched with different initial conditions. Trajectories are calculated with an analytic model that includes a linear drag term. There are three clusters of trajectories with similar initial conditions, identified by target ‘grp’. Target ‘th0’ gives the true initial launch angle in degrees. Target hit
target identifies trajectories that pass through a fixed “hoop” at x=0.5.
Applications:
Reading HF5 into a Pandas dataframe.
Dimensionality reduction (20D points lie on a 2D manifold).
Nonlinear regression (target ‘th0’).
Clustering (target ‘grp’).
Classification (target ‘hit’).
Cosmo#
Each sample is LCDM cosmology defined by input parameters ‘omega_b’, ‘omega_cdm’, ‘ln10^{10}A_s’ and ‘H0’. Corresponding targets are values of ‘sigma8’, ‘rd’, ‘DA(0.57)/rd’, ‘DH(0.57)/rd’, ‘DA(2.34)/rd’, and ‘DH(2.34)/rd’ calculated with CLASS. The CLASS calculations are relatively slow (~1 hr per 1K), so the goal of this dataset is to train a faster emulator. Input values are uniformly distributed on a grid centered on the Planck2015 best fit result and spanning +/-10 sigmas.
Applications:
Dimensionality reduction.
Approximately linear regression.
Higgs#
Data from the 2014 Higgs Challenge which is now archived here.
This file is too large to include in the repo, so instead the Pandas notebook provides a function to generate higgs_data.hf5
and higgs_target.hf5
from the downloaded .csv.gz
file and copy them into the installed data path.
Applications:
Dimensionality reduction.
Train/test/split.
Classification.
Clusters#
Demo files for clustering: 4 in 2D with 2 clusters, and 1 in 3D with 3 clusters. Data features are ‘x0’, ‘x1’ (‘x2’) and target is ‘y’.
Applications:
Clustering.
Spectra#
Spectra containing two peaks with variable flux and fixed locations and widths, over a constant background, with Poisson noise added. Data features are fluxes in wavelength bins (with un-named columns). Targets are the true fluxes in each peak (‘flux1’, ‘flux2’).
Applications:
Dimensionality reduction.
Clustering.
Regression.
Circles#
The circles files contain 500 2D points on two concentric circles with feature names ‘x0’, ‘x1’ and target integer ‘y’ = 0,1 indicating which circle they belong to.
Applications:
Linear clustering in higher dimensions.
Kernel trick.
Kernel PCA.
Ess#
The ess files contain 500 3D points on a 2D sheet bent into an S-shape with features named ‘x0’, ‘x1’, ‘x2’ and target value ‘y’ from 0-1 giving the coordinate along the sheet.
Applications:
Manifold learning.
Locally linear embedding (LLE).
Blobs#
The blobs files contain 2K 3D points sampled from 3 Gaussian blobs with features named ‘x0’, ‘x1’, ‘x2’ and target value ‘y’ = 0, 1, 2 giving their generated group membership.
Applications:
Clustering.
Density estimation.
Policies#
Covid#
Policies as it relates to COVID-19 can be found at https://covid19.illinois.edu
If you feel ill or are unable to come to class or complete class assignments due to issues related to COVID-19, including but not limited to testing positive yourself, feeling ill, caring for a family member with COVID-19, or having unexpected child-care obligations, you should contact your instructor immediately, and you are encouraged to copy your academic advisor.
About using generative AI for homework and projects#
Generative AI systems, such as ChatGPT, can be valuable tools for learning and idea refinement in this course. You are encouraged to use AI as a tutor to clarify programming concepts, debug code, or explore ideas through iterative conversations—similar to working with a peer, TA, or instructor. However, AI should not be used to directly copy-paste solutions or complete homework problems.
If you use generative AI, you must credit the source by including a comment with the original source of any code or information you incorporate into your work. Additionally, provide a brief description of how the AI was used, such as for debugging a function, refining the methodology, or improving the code efficiency. This helps ensure transparency regarding the use of AI in your work.
The goal of this course is to help you develop the skills to solve problems independently. While AI can extend your capabilities, it should be used as a tool for learning, not as a substitute for the problem-solving process. Relying on AI-generated answers or code without engaging in the problem-solving process can hinder your intellectual growth and is considered academically dishonest. As with all academic tools, AI should be used responsibly to support, not replace, your learning.
Academic Integrity#
You must never submit the work of someone else as your own. Collaboration with other students is encouraged to support learning, but you must be an active participant in developing any solutions you submit for credit. It is considered cheating to:
Receive answers from another student and submit them as your own.
Submit solutions found online or generated by tools such as ChatGPT without proper engagement and attribution (see “About using generative AI for homework and projects”).
Provide or sell course material to others who intend to redistribute it. This is also a violation of U.S. copyright law.
All activities in this course, are subject to the Academic Integrity rules as described in Article 1, Part 4, Academic Integrity, of the Student Code.
Sexual Misconduct Reporting Obligation#
The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX Office. In turn, an individual with the Title IX Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.
A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found here: wecare.illinois.edu/resources/students/#confidential.
Other information about resources and reporting is available here: https://wecare.illinois.edu and https://wellness.illinois.edu.
Mental Health Services#
Significant stress, mood changes, excessive worry, substance/alcohol misuse or interferences in eating or sleep can have an impact on academic performance, social development, and emotional wellbeing. The University of Illinois offers a variety of confidential services including individual and group counseling, crisis intervention, psychiatric services, and specialized screenings which are covered through the Student Health Fee. If you or someone you know experiences any of the above mental health concerns, it is strongly encouraged to contact or visit any of the University’s resources provided below. Getting help is a smart and courageous thing to do for yourself and for those who care about you.
Counseling Center (217) 333-3704
McKinley Health Center (217) 333-2700
National Suicide Prevention Lifeline (800) 273-8255
Rosecrance Crisis Line (217) 359-4141 (available 24/7, 365 days a year)
If you are in immediate danger, call 911 *This statement is approved by the University of Illinois Counseling Center.
Students with Disabilities#
To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to https://www.disability.illinois.edu. If you are concerned you have a disability-related condition that is impacting your academic progress, there are academic screening appointments available that can help diagnosis a previously undiagnosed disability. You may access these by visiting the DRES website and selecting “Request an Academic Screening” at the bottom of the page.
Resources#
Useful references#
Quick guides#
Jupyter Notebooks: Interface, Keyboard shortcuts
Tools#
Sharing code snippets: gist.github.com
Asking questions of broader development community: Stack Overflow
Git and GitHub#
Project Jupyter#
Acknowledgements#
This course was developed by Mark Neubauer during the Fall 2023 semester. I would like to acknowledge David Kirby at the University of California at Irvine for the materials and setup for which this course is based and the helpful discussions we have had.