Getting Started with Python

Exploratory Data Analysis

Alp Tezbasaran

2026-02-10

Welcome

Slides: https://go.ncsu.edu/dss-python QR code to workshop material


To get a complete from this session:

  • In-person: Sign the attendance sheet
  • Virtual: We will use Zoom attendance

Who we are

Mara Blake
Head, Data Science Services
Shannon Ricci
Associate Head, Data Science Services
Jeff Essic
Lead Librarian for GIS
Alp Tezbasaran
Data Science Librarian
Selene Schmittling
Data Science Librarian
Lynnee Argabright
Data Science Teaching Librarian

Our team comes from a variety of academic backgrounds, including nuclear engineering, electrical engineering, marine ecology, history, information science, natural resources, and geographic information sciences.

Check our workshops

go.ncsu.edu/datavizworkshops QR code to Data Viz Workshops

  • Mapping and Analyzing Urban Heat Island Effects with ArcGIS Online
  • Designing Data Visualizations for Research Presentations and Publications
  • Getting Started with Python [R]: Fundamentals
  • Getting Started with Python [R]: Loops and Functions
  • Exploratory Data Analysis with Python [R]
  • Git Moving: A Beginner’s Guide to Version Control
  • Colab and Chill: A Vibe Coding Approach to Exploratory Data Analysis using AI

Have workshop ideas? Contact us!

How to contact us

go.ncsu.edu/getdatahelp

Get Data Help

Direct email: getdatahelp@ncsu.edu

Who is this workshop for?

  • Learners who know basic Python syntax (variables, lists, simple conditionals/loops)
  • Anyone who wants to build confidence exploring tabular data in a repeatable way
  • People who use spreadsheets and want a Python workflow for analysis
  • Best fit in the full Getting Started with Python sequence:
    • Getting Started with Python - Fundamentals
    • Getting Started with Python - Loops and Functions
    • Exploratory Data Analysis with Python

What is exploratory data analysis?

  • Goal: understand quality, structure, and patterns before modeling
  • Typical cycle: inspect -> clean -> visualize -> summarize -> refine
  • Outcome: better questions and better next analysis steps

EDA Concepts

  • Shape: rows and columns
  • Schema: column data types
  • Missingness: null counts and percentages
  • Distribution: center, spread, skew, outliers
  • Cardinality: number of unique values
  • Grain: what one row represents
  • Association: variables that move together (not causation)

What can EDA help us decide?

  • Are we ready to analyze, or do we need more cleaning?
  • Which variables look useful, redundant, or suspicious?
  • Which transformations/features should we try next?
  • Which subgroups need deeper analysis?
  • What should come next: statistical test, model, or new data collection?

Important boundary

  • EDA helps generate evidence and hypotheses, but it does not prove causation.

Why Polars in this workshop?

  • Fast and memory-efficient for tabular workflows
  • Expression-based API keeps transformations readable and teachable
  • Great for core data-frame verbs (select, filter, with_columns, group_by)
  • Works well with familiar plotting tools like matplotlib

Polars vs pandas (quick view)

Topic Polars pandas
Execution style Expression-oriented, optional lazy mode Eager by default
Performance Often faster on larger tabular transforms Strong general-purpose baseline
Teaching focus today Clear pipeline thinking Familiar ecosystem many already know
Recommendation Use whichever fits your workflow/team Concepts transfer between both

Follow the instructor on Google Colab

During the workshop:

  • Open your Colab notebook
  • Type along with the instructor
  • Run each code cell after we write it
  • Ask questions if something doesn’t work

Tips for following along:

  • Keep your notebook visible
  • Don’t worry about typos - we’ll catch them
  • Experiment with changing values
  • Save your work frequently

How to access workshop material?

  1. Go to
    https://go.ncsu.edu/dss-python

  2. Scroll down to the right session. Click “Open in Colab”
    Colab logo

  3. Click “Copy to Drive”
    Copy to Drive

Repository screenshot

How to contact us

go.ncsu.edu/getdatahelp

Get Data Help

Direct email: getdatahelp@ncsu.edu

Continue learning Python!

  • NC State University Libraries workshops
  • LinkedIn Learning: lots of python courses!
  • Python for Data Analysis 3e
  • Search documentation for packages you want to learn!
  • Ask AI! (but learn the basics and concepts first)

Tell us how we did

go.ncsu.edu/dss-workshop-eval

QR code for workshop evaluation

THANK YOU