So, here, we are going to introduce some of the most important data structures: list, tuple, dictionary, set, and array-in this module. And you will build up the understanding of two indispensable tools so-called NumPy and pandas for advanced data analysis.
Learning Outcome:
Manipulate dataframes through selection, indexing, boolean masking, grouping, aggregation, and merging or joining.
The data features and methods of core data structures in pandas, mainly dataframes.
Major features and methods of generics built around core data structures in NumPy, like arrays and series.
Understand python tools, mention libraries, packages, modules, and global variables.
Understanding the features and methods of built-in python data structures, including lists, tuples, dictionaries, and sets.
This particular training is valid till October 2023.
PRACTICE QUIZ: Test your knowledge on Analytical Thinking
1. Lists and their contents are immutable, so their elements cannot be modified, added, or removed.
True
False (CORRECT)
Correct: Python lists define a mutable data type since the elements of a list can be modified, added, or removed. A list is a name-gagged type of ordered collection that can give, not only store, an ordered collection of items. The order of the items in the list is kept, and each item can be accessed through its index number (counting from 0). Lists can contain several data types including integers, strings, or even other lists.
2. What Python method adds an element to the end of a list?
append() (CORRECT)
pop()
remove()
type()
Correct: Python’s append() method adds an element to the end of a list.
3. A data professional wants to instantiate a tuple. What Python elements can they use to do so? Select all that apply.
The insert() function
Square brackets
Parentheses (CORRECT)
The tuple() function (CORRECT)
Correct: You build a tuple using any of the two methods and you can’t make changes in the contents of such a tuple since its value is immutable. A typical example of such an error generation is when trying to assign a new value to one of the tuple’s elements:
4. What Python technique formulaically creates a new list based on the values in an existing list?
List comprehension (CORRECT)
List nesting
List conversion
List sequencing
Correct: Indeed! A list comprehension is a way to create a list in compact and clear notation that applies an expression to each item in an existing iterable, typically a list. It is a much more compact and efficient form of building lists compared to the older method of using a for loop.
PRACTICE QUIZ: TEST YOUR KNOWLEDGE: DICTIONARIES AND SETS
1. Fill in the blank: In Python, a dictionary’s _____ must be immutable.
Order
Keys (CORRECT)
lists
sets
Correct: As you said, in Python, the keys of the dictionary must be hashable, which means they have to be immutable. This is because the dictionary relies on the hash values of the keys for efficiently looking them up. If the object is mutable, then it can change its value and could potentially have a different hash value, which would compromise the dictionary.
2. In Python, what does the items() method retrieve?\A dictionary’s sets
Only a dictionary’s values
Both a dictionary’s keys and values (CORRECT)
Only a dictionary’s keys
Correct: Sure! The items () method is used in Python to get the keys and values of a dictionary together as items, which are returned as a view object. Each item is represented in tuple form, with the first being the dictionary’s key and the second one, the corresponding value.
3. A data professional is working with two Python sets. What function can they use to find all the elements from both sets?
union() (CORRECT)
symmetric_difference()
difference()
intersection()
Correct: Indeed! In Python, .union() is the method for combining all elements of two sets it returns a new set with all the unique elements resulting from the operation. As a result, it merges the sets without duplicates.
PRACTICE QUIZ: TEST YOUR KNOWLEDGE: ARRAYS AND VECTORS WITH NUMPY
1. Python libraries and packages include which of the following features? Select all that apply.
Cells
Modules (CORRECT)
Reusable collections of code (CORRECT)
Documentation (CORRECT)
Correct: In Python, a library, which is also called as package, is nothing but a collection of reusable code, typically organized in modules. These modules can then be imported in different programs. The libraries, for the larger part, contain the already written functions, classes and some tools that help you escape reinventing the wheel, thus saving you time and effort while developing software. packages also contain related modules and documentation. You’ll often encounter the terms library and package used interchangeably.
2. What is the core data structure of NumPy?
List
Array (CORRECT)
Dictionary
Global variable
Correct: True! In fact, the core data structure in NumPy is an n-dimensional data array also understood as an ndarray. This sort of array is capable of representing data in more than one dimension (one-dimensional, two-dimensional, three-dimensional and so forth) and has a built-in way to achieve storage and manipulation of large data. The unified meaning of a NumPy array is that they are all of the same types so that we can easily store and process the array into a required way.
3. A data professional wants to confirm the datatype of the contents of array x. How would they do this?
x.ndim
type(x)
x.dtype (CORRECT)
datatype(x)
Correct: Indeed! The dtype refers to a NumPy attribute that represents data type, and it is used to identify what type of elements are stored in a NumPy array. It indicates the type of data-an integer, float, or complex type-present in an array element.
PRACTICE QUIZ: TEST YOUR KNOWLEDGE: DATAFRAMES WITH PANDAS
1. Fill in the blank: In pandas, a _____ is a one-dimensional, labeled array.
key
dataframe
series (CORRECT)
CSV file
Correct: A series is a one-dimensional array with haziness, wildly used to represent the individual rows or individual columns in a dataframe.
2. In pandas, what is Boolean masking used for?
Merging data in a dataframe
Adding data to a dataframe
Filtering data in a dataframe (CORRECT)
Deleting data from a dataframe
Correct: Boolean indexing is using a mask or Boolean gridding with which we can filter the complete data in pandas dataframe. In this method, we present Boolean grid to our dataframe and only the values under the True entries of this grid are selected.
3. What is a pandas method that groups rows of a dataframe together based on their values at one or more columns?
groupby() (CORRECT)
agg()
keys()
values()
Correct: The pandas groupby() method helps to group the rows in a dataframe based on values of one or more columns such that it is possible for a user to analyze and manipulate any group.
4. A data professional wants to join two dataframes together. The dataframes contain identically formatted data that needs to be combined vertically. What pandas function can the data professional use to join the dataframes?
insert()
concat() (CORRECT)
type()
merge()
Correct: With the concat() function of pandas, data professionals can merge data frames. One part can be done horizontally, that is, by adding new columns within the rows still present. The other part can be done vertically, by appending new rows to the columns already present.
QUIZ: MODULE 4 CHALLENGE
1. Which of the following statements accurately describe Python lists? Select all that apply.
Lists are immutable.
Lists can be indexed and sliced. (CORRECT)
Lists can contain sequences of elements of any data type. (CORRECT)
Lists are mutable. (CORRECT)
Correct!
2. A data professional is working with a list named cities that contains data on global cities. What Python code can they use to add the string ‘Tokyo’ to the end of the list?
cities.append(‘Tokyo’) (CORRECT)
cities.insert(‘Tokyo’)
cities.pop(‘Tokyo’)
cities.import(‘Tokyo’)
Correct!
3. In Python, which of the following characters can a data professional use to instantiate a tuple?
{ }
( ) (CORRECT)
[ ]
< >
Correct!
4. Which of the following statements accurately describe Python dictionaries? Select all that apply.
Dictionaries are instantiated with quotation marks.
Dictionaries consist of string-tuple pairs.
Dictionaries consist of collections of key-value pairs. (CORRECT)
Dictionaries are instantiated with the dict() function. (CORRECT)
Correct!
5. A data professional is working with a dictionary named employees that contains employee data for a healthcare company. What Python code can they use to retrieve only the dictionary’s values?
values.employees()
items.employees()
employees.items()
employees.values() (CORRECT)
Correct!
6. A data professional is working with two Python sets. What function can they use to find elements from both sets that are mutually not present in the other?
union()
difference()
intersection()
symmetric_difference() (CORRECT)
Correct!
7. Where are modules accessed in Python?
Within a package or library (CORRECT)
Within a dictionary
Within a set
Within a global variable
Correct!
8. Which of the following statements accurately describe NumPy arrays? Select all that apply.
Arrays are immutable.
Arrays contain elements of the same data type. (CORRECT)
Arrays are mutable. (CORRECT)
Arrays can be multidimensional. (CORRECT)
Correct!
9. A data professional is working with a pandas dataframe named sales that contains sales data for a retail website. They want to know the price of the most expensive item. What code can they use to calculate the maximum value of the Price column?
sales = ‘Price’.max()
sales.max().[Price]
sales.max().Price
sales[‘Price’].max() (CORRECT)
Correct!
10. A data professional is working with a pandas dataframe. They want to select a subset of rows and columns by index. What method can they use to do so?
merge()
concat()
loc[]
iloc[] (CORRECT)
Correct!
11. A data professional wants to merge two pandas dataframes. They want to join the data so all of the keys from both dataframes get included in the merge. What technique can they use to do so?
Outer join
Right join
Inner join
Left join (CORRECT)
Correct!
12. In Python, what data structure helps store and manipulate an ordered collection of items?
List (CORRECT)
Dictionary
Tuple
Set
Correct!
13. A data professional is working with a list named cities that contains data on global cities. The string ‘Houston’ is the third element in the list. What Python code can they use to remove the string ‘Houston’ from the list?
cities.pop(3)
cities.pop(1)
cities.pop(2) (CORRECT)
cities.pop(4)
Correct!
14. Which of the following statements accurately describe Python tuples? Select all that apply.
Tuples cannot be split into separate variables.
Tuples are immutable. (CORRECT)
Tuples can be split into separate variables. (CORRECT)
Tuples are sequences. (CORRECT)
Correct!
15. Fill in the blank: In Python, a dictionary’s keys must be _____.
equal
mutable
immutable (CORRECT)
identical
Correct!
16. How do global variables differ from other variables in Python? Select all that apply.
Global variables cannot be accessed from a script.
Global variables cannot be accessed from a program.
Global variables can be accessed from anywhere in a program. (CORRECT)
Global variables can be accessed from anywhere in a script. (CORRECT)
Correct!
17. Fill in the blank: A _____ NumPy array can be created from a list of lists, where each internal list is the same length.
Online advertising (Correct)
Word-of-mouth advertising
Direct mail advertising
Billboard advertising
Correct: Whether a brick-and-mortar store or online retailer, online advertising is now a popular method for most businesses’ advertising purposes.
18. A data professional is working with a pandas dataframe named sales that contains sales data for a retail website. They want to know the average price of an item. What code can they use to calculate the mean value of the Price column?
sales.mean().[Price]
sales.(Price).mean()
sales[‘Price’].mean() (CORRECT)
sales = mean().Price
Correct!
19. In pandas, what is the difference between the iloc[] and loc[] methods?
iloc[] selects dataframe rows and columns by name; loc[] selects dataframe rows and columns by index.
iloc[] selects dataframe rows and columns by index; loc[] selects dataframe rows and columns by name. (CORRECT)
iloc[] merges two dataframes horizontally; loc[] merges two dataframes vertically.
iloc[] merges two dataframes vertically; loc[] merges two dataframes horizontally.
Correct!
20. In Python, what types of data can tuples contain? Select all that apply.
Modules
Floats (CORRECT)
Strings (CORRECT)
Integers (CORRECT)
Correct!
21. Fill in the blank: In Python, _____ indicate where a list starts and ends.
square brackets (CORRECT)
parentheses
quotation marks
braces
Correct!
22. In Python, which of the following characters can a data professional use to instantiate a dictionary?
< >
( )
[ ]
{ } (CORRECT)
Correct!
23. A data professional is working with a dictionary named employees that contains employee data for a healthcare company. What Python code can they use to retrieve only the dictionary’s keys?
employees.keys() (CORRECT)
items.employees()
employees.items()
keys.employees()
Correct!
24. A data professional is working with two Python sets. What function can they use to find the elements present in one set, but not the other?
intersection()
difference() (CORRECT)
union()
symmetric_difference()
Correct!
25. A data professional is working with a NumPy array that has three rows and two columns. They want to change the data into two rows and three columns. What method can they use to do so?
reshape() (CORRECT)
agg()
groupby()
type()
Correct!
26. A data professional is working with two Python sets. What function can they use to find the elements that the two sets have in common?
symmetric_difference()
difference()
union()
intersection() (CORRECT)
Correct!
27. A data professional is working with a dictionary named employees that contains employee data for a healthcare company. What Python code can they use to retrieve both the dictionary’s keys and values?
employees.items() (CORRECT)
items.employees()
keys.employees()
employees.keys()
Correct!
28. A data professional wants to merge two pandas dataframes. They want to join the data so only the keys that are in both dataframes get included in the merge. What technique can they use to do so?
Left join (CORRECT)
Right join
Outer join
Inner join
Correct!
29. Fill in the blank: Mutability refers to the ability to _____ the internal state of a data structure.
calculate
change (CORRECT)
classify
evaluate
Correct: Mutability means that a data structure can change its internal state at any moment. In contrast, immutability makes it impossible to alter or update the values of a data structure or elements after they are set.
30. A tuple is an immutable sequence that can contain elements of any data type.
True (CORRECT)
False
Correct: A tuple is immutable; it’s a sequence that can hold elements of any data type. It can’t be changed once it has been created.
31. Fill in the blank: A dictionary is a data structure that consists of a collection of _____ pairs.
keyword
string
key-value (CORRECT)
integer
Correct: A dictionary is a data structure for storing a set of key-value pairs. To retrieve data values from a dictionary in Python, one looks up the appropriate key.
32. In Python, what type of elements does a set contain? Select all that apply.
Interchangeable
Ordered
Non-interchangeable (CORRECT)
Unordered (CORRECT)
Correct: In the Python programming language, a set is a data structure that has unordered, unique elements that do not permit any duplicate value.
33. Fill in the blank: In NumPy, _____ enables operations to be performed on multiple components of a data object at the same time.
Vectorization (CORRECT)
evaluation
classification
conversion
Correct: When using NumPy, vectorization helps perform operations on multiple elements of a data object at the same time. It comes particularly handy for data professionals working with large volumes of data since it can be quite beneficial for the effective computation of large amounts of data.
34. Fill in the blank: In pandas a dataframe is a _____-dimensional, labeled data structure.
Two (CORRECT)
one
three
zero]
Correct: A dataframe in pandas is the two-dimensional, labeled data structure within which the data is stored in row and column combinations. It is a common way of representing structured data for manipulation and analysis.