Python has packages for data analysis, but R has larger ecosystem
Python can be more efficient in non-statistical areas, such as text analysis
Both are useful and complement each other
Data Structures in Pandas
Series and DataFrames
Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.
Example of Series
Data on different occupations
0
Librarian
1
Software Developer
2
Engineer
3
IT Support Specialist
4
Research Data Coordinator
DataFrame
A DataFrame is a tabular data structure comprised of rows and columns, akin to a spreadsheet, database table, or R's data.frame object. It can be thought as a group of Series objects that share an index (column names).
Example of DataFrame
Data about different departments within a library
Department Name
Number of Employees
Number of Machines
0
Digital Library Initiatives (DLI)
12
14
1
IT
20
25
2
Special Collections
13
10
3
Building Services
2
1
4
Research Data Management
2
3
About Jupyter Notebook
What is a Jupyter Notebook?
Jupyter Notebooks are interactive documents that can contain both code and rich text elements.
Renders Python and Markdown by default.
This literate programming approach makes it easier to share code, and is popular in scientific computing and data science.