Blog post image

November 18, 2023

6 min read

Basic Data Structures in Python

Python has several general purpose built-in data structures:

  • List
  • Tuple
  • Dictionary
  • Set

Let’s have a look how we can use them and implement other data structures in Python!

List

List or array is the most basic data structure and also most used. Every developer is familiar with lists in any programming langauge.

In Python list is mutable sequence of items, which is also stored linear in memory. It has methods like append, insert, pop, index and sort.

items = [1, 2, 3]
items.append(4)

for i in items:
  print(i) # 1, 2, 3, 4

Lists are generally used to store collection of items. Lists performs the best in time complexity O(1) for random access. So get item by its index (if you know it of course) is super fast. But other operations like search, insert or remove element is much slower and on average has complexity of O(n) .

Stack

Stack is list like data structure and also stores a collection of elements. The order of elements added or removed from stack is LIFO (Last In First Out).

Stack has two main operations push and pop, to add and remove elements from stack.

Let’s try to implement stack in Python using built-in list data structure.

class Stack:
  def __init__(self, items = []):
    self.__items = items

  def get_items(self):
    return self.__items

  def push(self, item):
    self.__items.append(item)

  def pop(self):
    return self.__items.pop()

  def peek(self):
    return self.__items[-1]

  def is_empty(self):
    return len(self.__items) == 0


stack = Stack([1, 2, 3])
stack.push(4)
stack.push(5)
stack.peek() # 5
stack.pop() # removed 5
print(stack.get_items()) # [1, 2, 3, 4]

Queue

Queue is list like collection of elements. This is in contrast to a stack, it adds and removes elements in FIFO (First In First Out) order.

Queue has two main methods enqueue and dequeue, to add and remove elements from queue.

Let’s try to implement stack in Python using built-in list data structure.

class Queue:
  def __init__(self, items = []):
    self.__items = items

  def get_items(self):
    return self.__items

  def enqueue(self, item):
    self.__items.append(item)

  def dequeue(self):
    return self.__items.pop(0)

  def size(self):
    return len(self.__items)

  def is_empty(self):
    return len(self.__items) == 0


q = Queue()
q.enqueue('a')
q.enqueue('b')
q.enqueue('c')
q.dequeue()

print(q.get_items()) # ['b', 'c']

Main difference between stack and queue is that removing element is done from the end or the beging. Here it is achieved by using pop(0) which removes element at position zero, so beginning of the queue.

It is very important to say that there is no need to implement stack and queue by yourself 😃. Actually they exist in Python in a form of another data structure called deque.

Deque

Deque (the name is pronounced “deck” and is short for “double-ended queue) is a generalization of stacks and queues from “collections” module.

Stack and queue should have O(1) complexity for insertion and deletion. But list doesn’t have it! If you use list then complexity will be O(n). So in short the correct way is to always use deque instead of implementing it yourself with list.

from collections import deque

deq_as_queue = deque()

deq_as_queue.append('a')
deq_as_queue.append('b')
deq_as_queue.append('c')

deq_as_queue.popleft() # removes first element

print(deq_as_queue)  # deque(['b', 'c'])

deq_as_stack = deque()

deq_as_stack.append('a')
deq_as_stack.append('b')
deq_as_stack.append('c')

deq_as_stack.pop() # removes last element

print(deq_as_stack)  # deque(['a', 'b'])

Linked List

There are several types of linked lists:

  • Singly Linked list
  • Doubly Linked list
  • Circular Linked list

Let’s have a look at Singly Linked list first, because it’s the easiest. So what is a Linked List?

Linked List is represented as chain of nodes connected together. It contains nodes which has value and pointer to next node.

List usually have head so beginning of the list and tail so the end of the list. The last node should point to None.

Enough theory 😫… We can start implement our Singly Linked List in Python.

First we create a Node class. It will be very simple, just hold some data and have a reference to the next node.

class Node:
    def __init__(self, value):
        self.value = value
        self.next = None

Then create a LinkedList class. It should have head and tail for start and end of list. Usually linked list have methods like append and prepend, to add new elements to the beginning or end of the list.

class Node:
    def __init__(self, value):
        self.value = value
        self.next = None

class LinkedList:
    def __init__(self) -> None:
        self.head = None
        self.tail = None

    def append(self, value):
        new_node = Node(value)

        if self.head is None:
            self.head = new_node
            self.tail = new_node
            return

        self.tail.next = new_node
        self.tail = new_node

    def prepend(self, value):
        new_node = Node(value)

        if self.head is None:
            self.head = new_node
            self.tail = new_node
            return

        new_node.next = self.head
        self.head = new_node

Now we can have a closer look what’s going on. In constructor we define head and tail with None values, because list is empty in the beginning.

When you append an item to the list, in case list is empty we need to set our head and tail to first node. Latter if you append more items what we need to do is only to modify our current tail pointer to new node and set new item as tail.

When we insert in the beginning, we should again handle case when list is empty, in case it is not we take current head and set it as pointer for new node also replacing head.

Both these operations of insertion at the start and adding to the end of list take constant time O(1). This is the main difference between linked list and arrays. Remember that arrays are linear structure (also in memory), but linked list is not. When you add items to the beging of the array all other elements need to be shifted to the right which takes O(n) time.

However, to insert an element into linked list at certain index takes 0(n)!! But for arrays its constant time O(1), because arrays support random access. And this is a second big difference between them. This means it is a tradeoff when using these data structures.

To demonstrate why get, insert or remove element by index takes O(n) we will implement an insert method.

# ...
def insert(self, value, position):
    new_node = Node(value)

    if position == 0:
        self.prepend(value)
        return

    counter = 0
    leader = self.head
    while counter < position - 1:
        leader = leader.next
        counter += 1

    new_node.next = leader.next
    leader.next = new_node

As you can see to insert an element at certain position we should iterate with while loop over all nodes starting at head and also keeping track of our iterations. Then we can replace found node and assign new node next to it. Very similar idea would be to delete element at certain position or print list.

Let’s also add a print method.

# ...
def print_list(self):
    current = self.head

    while current is not None:
        print(current.value)
        current = current.next

We again use while loop to start from head and iterate through every node until we found last one which has next set to None. Now let’s see how we can use this.

list = LinkedList()

list.append('A')
list.append('B')
list.prepend('C')
list.insert('D', 1)

list.print_list() # C D A B

Doubly Linked list is implemented in a very similar way just every node has also a pointer to previous node. This gives us ability to traverse list also backwards and insert elements before certain position.

With Circular Linked list tail is connected to head wich basically makes it connected as a circle. This gives us ability to traverse list in rounds, so there is no end of the list.

Again there is no need to implement Linked List yourself in Python 😬! Because we can also use deque here. Actually deque is represented internally as a doubly linked list. You can append and prepend items with constant time O(1). For more reference about time complexity click here.

Tuple

Tuple is a built-in data structure in Python. It is similar to a list with one main difference: they are immutable. Yes, once tuple is defined it can not be changed, it’s a fixed sequence of items in particular order.

book = ("The Hobbit", "John Ronald Reuel Tolkien", 1937)

print(len(book))  # 3
print(book[2])  # 1937
print(book[1:])  # ('John Ronald Reuel Tolkien', 1937)
print("The Hobbit" in book)  # True

Dictionary

Dictionary sometimes also called HashTable or HashMap, is a built-in data structure to store key: value pairs. The main purpose is to store some data by its key as index and then get item by it’s key.

book = {
    "title": "The Hobbit",
    "author": "John Ronald Reuel Tolkien",
    "year": 1937
}

print(len(book))  # 3
print(book["year"])  # 1937
print(book.get("author"))  # John Ronald Reuel Tolkien
print(book.keys())  # ['title', 'author', 'year']
print(book.values())  # ['The Hobbit', 'John Ronald Reuel Tolkien', 1937]

Dictionary is probably one of the most useful data structures, because of its amazing performance. You can check here for a dict to understand it. Dictionary has O(1) time complexity for operations like: search, get, set, delete.

To iterate over dict we can call items() method. It returns a tuple of (key,value) on every iteration. 🧐

for key, value in book.items():
    print(key, value) # title The Hobbit, author John Ronald Reuel Tolkien, year 1937

Set

Set is a built-in data structure in Python. It’s a mutable data structure similar to list, but it does not allow duplicate values. It is often used to check value membership and eliminate duplicate items.

fruits = {'apple', 'orange', 'pear', 'banana'}
fruits.add('pineapple')  # adds pineapple
fruits.discard('pear')  # removes pear
print('pear' in fruits)  # False

Be careful when you define empty set! If you define it with empty curly braces {} it will be a dictionary. If you need to initialize an empty set then call it’s initializer set().

empty_dict = {} # empty dict
empty_set = set() # empty set

Sets in Python come form math, specifically from Set Theory 🤓. If you familiar with this topic, then you know about set operations: union, intersection, and difference.

Let me give you some examples:

set_A = {1, 2, 3}
set_B = {2, 4, 5}

print(set_A | set_B)  # Union { 1, 2, 3, 4, 5 }
print(set_A & set_B)  # Intersection { 2 }
print(set_A - set_B)  # Difference { 1, 3 }

That’s it

Thank you for reading, I hope it was interesting and not very complicated 😄!

I plan to add more content about me learning Python. Stay tuned and see you soon! 🐍