Data Structures
bloom filter
A bloom filter is a space-efficient probabilistic data structure designed to test whether an element is present in a set. It is designed to be blazingly fast and use minimal memory at the cost of potential false positives. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Bloom proposed the technique for applications where the amount of source data would require an impractically large amount of memory if "conventional" error-free hashing techniques were applied. An empty Bloom filter is a bit array of m
bits, all set to 0
. There must also be k
different hash functions defined, each of which maps or hashes some set element to one of the m
array positions, generating a uniform random distribution. Typically, k
is a constant, much smaller than m
, which is proportional to the number of elements to be added; the precise choice of k
and the constant of proportionality of m
are determined by the intended false positive rate of the filter. Here is an example of a Bloom filter, representing the set {x, y, z}
. The colored arrows show the positions in the bit array that each set element is mapped to. The element w
is not in the set {x, y, z}
, because it hashes to one bit-array position containing 0
. For this figure, m = 18
and k = 3
. There are two main operations a bloom filter can perform: insertion and search. Search may result in false positives. Deletion is not possible. In other words, the filter can take in items. When we go to check if an item has previously been inserted, it can tell us either "no" or "maybe". Both insertion and search are O(1)
operations. A bloom filter is created by allotting a certain size. In our example, we use 100
as a default length. All locations are initialized to false
. During insertion, a number of hash functions, in our case 3
hash functions, are used to create hashes of the input. These hash functions output indexes. At every index received, we simply change the value in our bloom filter to true
. During a search, the same hash functions are called and used to hash the input. We then check if the indexes received all have a value of true
inside our bloom filter. If they all have a value of the value previously inserted. However, it's not certain, because it's possible that other values previously inserted flipped the values to true
. The values aren't necessarily Absolute certainty is impossible unless only a single item has previously been inserted. While checking the bloom filter for the indexes returned by our hash functions, if even one of them has a value of false
, we definitively know that the item was not previously inserted. The probability of false positives is determined by three factors: the size of the bloom filter, the number of hash functions we use, and the number of items that have been inserted into the filter. The formula to calculate probablity of a false positive is: ( 1 - e <sup>-kn/m</sup> ) <sup>k</sup> These variables, k
, m
, and n
, should be picked based on how acceptable false positives are. If the values are picked and the resulting probability is too high, the values should be tweaked and the probability re-calculated. A bloom filter can be used on a blogging website. If the goal is to show readers only articles that they have never seen before, a bloom filter is perfect. It can store hashed values based on the articles. After a user reads a few articles, they can be inserted into the filter. The next time the user visits the site, those articles can be filtered out of the results. Some articles will inevitably be filtered out by mistake, but the cost is acceptable. It's ok if a user never sees a few articles as long as they have other, brand new ones to see every time they visit the site.
disjoint set
Disjoint-set data structure (also called a union–find data structure or merge–find set) is a data structure that tracks a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It provides near-constant-time operations (bounded by the inverse Ackermann function) to add new sets, to merge existing sets, and to determine whether elements are in the same set. In addition to many other uses (see the Applications section), disjoint-sets play a key role in Kruskal's algorithm for finding the minimum spanning tree of a graph. MakeSet creates 8 singletons. After some operations of Union, some sets are grouped together.
doubly linked list
In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains two fields, called links, that are references to the previous and to the next node in the sequence of nodes. The beginning and ending nodes' previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate the traversal of the list. If there is only one sentinel node, then the list is circularly linked via the sentinel node. It can be conceptualized as two singly linked lists formed from the same data items, but in opposite sequential orders. The two node links allow traversal of the list in either direction. While adding or removing a node in a doubly linked list requires changing more links than the same operations on a singly linked list, the operations are simpler and potentially more efficient (for nodes other than first nodes) because there is no need to keep track of the previous node during traversal or no need to traverse the list to find the previous node, so that its link can be modified. Add(value) Pre: value is the value to add to the list Post: value has been placed at the tail of the list n ← node(value) if head = ø head ← n tail ← n else n.previous ← tail tail.next ← n tail ← n end if end Add Remove(head, value) Pre: head is the head node in the list value is the value to remove from the list Post: value is removed from the list, true; otherwise false if head = ø return false end if if value = head.value if head = tail head ← ø tail ← ø else head ← head.next head.previous ← ø end if return true end if n ← head.next while n != ø and value !== n.value n ← n.next end while if n = tail tail ← tail.previous tail.next ← ø return true else if n != ø n.previous.next ← n.next n.next.previous ← n.previous return true end if return false end Remove ReverseTraversal(tail) Pre: tail is the node of the list to traverse Post: the list has been traversed in reverse order n ← tail while n != ø yield n.value n ← n.previous end while end Reverse Traversal O(n)
graph
In computer science, a graph is an abstract data type that is meant to implement the undirected graph and directed graph concepts from mathematics, specifically the field of graph theory A graph data structure consists of a finite (and possibly mutable) set of vertices or nodes or points, together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as edges, arcs, or lines for an undirected graph and as arrows, directed edges, directed arcs, or directed lines for a directed graph. The vertices may be part of the graph structure, or may be external entities represented by integer indices or references.
hash table
In computing, a hash table (hash map) is a data structure which implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions must be accommodated in some way. Hash collision resolved by separate chaining.
heap
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property described below. In a min heap, if P
is a parent node of C
, then the key (the value) of P
is less than or equal to the key of C
. In a max heap, the key of P
is greater than or equal to the key of C
The node at the "top" of the heap with no parents is called the root node.
linked list
In computer science, a linked list is a linear collection of data elements, in which linear order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of data and a reference (in other words, a link) to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration. More complex variants add additional links, allowing efficient insertion or removal from arbitrary element references. A drawback of linked lists is that access time is linear (and difficult to pipeline). Faster access, such as random access, is not feasible. Arrays have better cache locality as compared to linked lists. Add(value) Pre: value is the value to add to the list Post: value has been placed at the tail of the list n ← node(value) if head = ø head ← n tail ← n else tail.next ← n tail ← n end if end Add Prepend(value) Pre: value is the value to add to the list Post: value has been placed at the head of the list n ← node(value) n.next ← head head ← n if tail = ø tail ← n end end Prepend Contains(head, value) Pre: head is the head node in the list value is the value to search for Post: the item is either in the linked list, true; otherwise false n ← head while n != ø and n.value != value n ← n.next end while if n = ø return false end if return true end Contains Remove(head, value) Pre: head is the head node in the list value is the value to remove from the list Post: value is removed from the list, true, otherwise false if head = ø return false end if n ← head if n.value = value if head = tail head ← ø tail ← ø else head ← head.next end if return true end if while n.next != ø and n.next.value != value n ← n.next end while if n.next != ø if n.next = tail tail ← n tail.next = null end if n.next ← n.next.next return true end if return false end Remove Traverse(head) Pre: head is the head node in the list Post: the items in the list have been traversed n ← head while n != ø yield n.value n ← n.next end while end Traverse ReverseTraversal(head, tail) Pre: head and tail belong to the same list Post: the items in the list have been traversed in reverse order if tail != ø curr ← tail while curr != head prev ← head while prev.next != curr prev ← prev.next end while yield curr.value curr ← prev end while yield curr.value end if end ReverseTraversal O(n)
priority queue
In computer science, a priority queue is an abstract data type which is like a regular queue or stack data structure, but where additionally each element has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. If two elements have the same priority, they are served according to their order in the queue. While priority queues are often implemented with heaps, they are conceptually distinct from heaps. A priority queue is an abstract concept like "a list" or "a map"; just as a list can be implemented with a linked list or an array, a priority queue can be implemented with a heap or a variety of other methods such as an unordered array.
queue
In computer science, a queue is a particular kind of abstract data type or collection in which the entities in the collection are kept in order and the principle (or only) operations on the collection are the addition of entities to the rear terminal position, known as enqueue, and removal of entities from the front terminal position, known as dequeue. This makes the queue a First-In-First-Out (FIFO) data structure. In a FIFO data structure, the first element added to the queue will be the first one to be removed. This is equivalent to the requirement that once a new element is added, all elements that were added before have to be removed before the new element can be removed. Often a peek or front operation is also entered, returning the value of the front element without dequeuing it. A queue is an example of a linear data structure, or more abstractly a sequential collection. Representation of a FIFO (first in, first out) queue
stack
In computer science, a stack is an abstract data type that serves as a collection of elements, with two principal operations: * push, which adds an element to the collection, and * pop, which removes the most recently added element that was not yet removed. The order in which elements come off a stack gives rise to its alternative name, LIFO (last in, first out). Additionally, a peek operation may give access to the top without modifying the stack. The name "stack" for this type of structure comes from the analogy to a set of physical items stacked on top of each other, which makes it easy to take an item off the top of the stack, while getting to an item deeper in the stack may require taking off multiple other items first. Simple representation of a stack runtime with push and pop operations.
tree
- Binary Search Tree * AVL Tree * Red-Black Tree * Segment Tree - with min/max/sum range queries examples * Fenwick Tree (Binary Indexed Tree) In computer science, a tree is a widely used abstract data type (ADT) — or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes. A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root. A simple unordered tree; in this diagram, the node labeled 7 has two children, labeled 2 and 6, and one parent, labeled 2. The root node, at the top, has no parent.
avl tree
In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n)
time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations. Animation showing the insertion of several elements into an AVL tree. It includes left, right, left-right and right-left rotations. AVL tree with balance factors (green) Left-Left Rotation Right-Right Rotation Left-Right Rotation Right-Left Rotation * Wikipedia * Tutorials Point * BTech * AVL Tree Insertion on YouTube * AVL Tree Interactive Visualisations
binary search tree
In computer science, binary search trees (BST), sometimes called ordered or sorted binary trees, are a particular type of container: data structures that store "items" (such as numbers, names etc.) in memory. They allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items, or lookup tables that allow finding an item by its key (e.g., finding the phone number of a person by name). Binary search trees keep their keys in sorted order, so that lookup and other operations can use the principle of binary search: when looking for a key in a tree (or a place to insert a new key), they traverse the tree from root to leaf, making comparisons to keys stored in the nodes of the tree and deciding, on the basis of the comparison, to continue searching in the left or right subtrees. On average, this means that each comparison allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time proportional to the logarithm of the number of items stored in the tree. This is much better than the linear time required to find items by key in an (unsorted) array, but slower than the corresponding operations on hash tables. A binary search tree of size 9 and depth 3, with 8 at the root. The leaves are not drawn. insert(value) Pre: value has passed custom type checks for type T Post: value has been placed in the correct location in the tree if root = ø root ← node(value) else insertNode(root, value) end if end insert insertNode(current, value) Pre: current is the node to start from Post: value has been placed in the correct location in the tree if value < current.value if current.left = ø current.left ← node(value) else InsertNode(current.left, value) end if else if current.right = ø current.right ← node(value) else InsertNode(current.right, value) end if end if end insertNode contains(root, value) Pre: root is the root node of the tree, value is what we would like to locate Post: value is either located or not if root = ø return false end if if root.value = value return true else if value < root.value return contains(root.left, value) else return contains(root.right, value) end if end contains remove(value) Pre: value is the value of the node to remove, root is the node of the BST count is the number of items in the BST Post: node with value is removed if found in which case yields true, otherwise false nodeToRemove ← findNode(value) if nodeToRemove = ø return false end if parent ← findParent(value) if count = 1 root ← ø else if nodeToRemove.left = ø and nodeToRemove.right = ø if nodeToRemove.value < parent.value parent.left ← nodeToRemove.right else parent.right ← nodeToRemove.right end if else if nodeToRemove.left != ø and nodeToRemove.right != ø next ← nodeToRemove.right while next.left != ø next ← next.left end while if next != nodeToRemove.right remove(next.value) nodeToRemove.value ← next.value else nodeToRemove.value ← next.value nodeToRemove.right ← nodeToRemove.right.right end if else if nodeToRemove.left = ø next ← nodeToRemove.right else next ← nodeToRemove.left end if if root = nodeToRemove root = next else if parent.left = nodeToRemove parent.left = next else if parent.right = nodeToRemove parent.right = next end if end if count ← count - 1 return true end remove findParent(value, root) Pre: value is the value of the node we want to find the parent of root is the root node of the BST and is != ø Post: a reference to the prent node of value if found; otherwise ø if value = root.value return ø end if if value < root.value if root.left = ø return ø else if root.left.value = value return root else return findParent(value, root.left) end if else if root.right = ø return ø else if root.right.value = value return root else return findParent(value, root.right) end if end if end findParent findNode(root, value) Pre: value is the value of the node we want to find the parent of root is the root node of the BST Post: a reference to the node of value if found; otherwise ø if root = ø return ø end if if root.value = value return root else if value < root.value return findNode(root.left, value) else return findNode(root.right, value) end if end findNode findMin(root) Pre: root is the root node of the BST root = ø Post: the smallest value in the BST is located if root.left = ø return root.value end if findMin(root.left) end findMin findMax(root) Pre: root is the root node of the BST root = ø Post: the largest value in the BST is located if root.right = ø return root.value end if findMax(root.right) end findMax inorder(root) Pre: root is the root node of the BST Post: the nodes in the BST have been visited in inorder if root != ø inorder(root.left) yield root.value inorder(root.right) end if end inorder preorder(root) Pre: root is the root node of the BST Post: the nodes in the BST have been visited in preorder if root != ø yield root.value preorder(root.left) preorder(root.right) end if end preorder postorder(root) Pre: root is the root node of the BST Post: the nodes in the BST have been visited in postorder if root != ø postorder(root.left) postorder(root.right) yield root.value end if end postorder O(n)
fenwick tree
A Fenwick tree or binary indexed tree is a data structure that can efficiently update elements and calculate prefix sums in a table of numbers. When compared with a flat array of numbers, the Fenwick tree achieves a much better balance between two operations: element update and prefix sum calculation. In a flat array of n
numbers, you can either store the elements, or the prefix sums. In the first case, computing prefix sums requires linear time; in the second case, updating the array elements requires linear time (in both cases, the other operation can be performed in constant time). Fenwick trees allow both operations to be performed in O(log n)
time. This is achieved by representing the numbers as a tree, where the value of each node is the sum of the numbers in that subtree. The tree structure allows operations to be performed using only O(log n)
node accesses. Binary Indexed Tree is represented as an array. Each node of Binary Indexed Tree stores sum of some elements of given array. Size of Binary Indexed Tree is equal to n
where n
is size of input array. In current implementation we have used size as n+1
for ease of implementation. All the indexes are 1-based. On the picture below you may see animated example of creation of binary indexed tree for the array [1, 2, 3, 4, 5]
by inserting one by one.
red black tree
A red–black tree is a kind of self-balancing binary search tree in computer science. Each node of the binary tree has an extra bit, and that bit is often interpreted as the color (red or black) of the node. These color bits are used to ensure the tree remains approximately balanced during insertions and deletions. Balance is preserved by painting each node of the tree with one of two colors in a way that satisfies certain properties, which collectively constrain how unbalanced the tree can become in the worst case. When the tree is modified, the new tree is subsequently rearranged and repainted to restore the coloring properties. The properties are designed in such a way that this rearranging and recoloring can be performed efficiently. The balancing of the tree is not perfect, but it is good enough to allow it to guarantee searching in O(log n)
time, where n
is the total number of elements in the tree. The insertion and deletion operations, along with the tree rearrangement and recoloring, are also performed in O(log n)
time. An example of a red–black tree: In addition to the requirements imposed on a binary search tree the following must be satisfied by a red–black tree: Since the root can always be changed from red to black, but not necessarily vice versa, this rule has little effect on analysis. NIL nodes contains the same number of black nodes. Some definitions: the number of black nodes from the root to a node is the node's black depth; the uniform number of black nodes in all paths from root to the leaves is called the black-height of the red–black tree. These constraints enforce a critical property of red–black trees: the path from the root to the farthest leaf is no more than twice as long as the path from the root to the nearest leaf. The result is that the tree is roughly height-balanced. Since operations such as inserting, deleting, and finding values require worst-case time proportional to the height of the tree, this theoretical upper bound on the height allows red–black trees to be efficient in the worst case, unlike ordinary binary search trees.
segment tree
In computer science, a segment tree also known as a statistic tree is a tree data structure used for storing information about intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure; that is, it's a structure that cannot be modified once it's built. A similar data structure is the interval tree. A segment tree is a binary tree. The root of the tree represents the whole array. The two children of the root represent the first and second halves of the array. Similarly, the children of each node corresponds to the two halves of the array corresponding to the node. We build the tree bottom up, with the value of each node being the "minimum" (or any other function) of its children's values. This will take O(n log n)
time. The number of operations done is the height of the tree, which is O(log n)
. To do range queries, each node splits the query into two parts, one sub-query for each child. If a query contains the whole subarray of a node, we can use the precomputed value at the node. Using this optimisation, we can prove that only O(log n)
minimum operations are done. A segment tree is a data structure designed to perform certain array operations efficiently - especially those involving range queries. Applications of the segment tree are in the areas of computational geometry, and geographic information systems. Current implementation of Segment Tree implies that you may pass any binary (with two input params) function to it and thus you're able to do range query for variety of functions. In tests you may find examples of doing min
, max
and sum
range queries on SegmentTree.
trie
In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is a kind of search tree—an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are not necessarily associated with every node. Rather, values tend only to be associated with leaves, and with some inner nodes that correspond to keys of interest. For the space-optimized presentation of prefix tree, see compact prefix tree.