Discussion of Common Datastructures in Software Interviews

One key to acing the software interview at Google and companies like it is to have a firm grasp on datastructures, and employing them at the right times. Properly representing the data on hand can bring new insight to the strategy of solving the problem. Here we'll go through the common datastructures you need to know, and then look at a problem where changing the input brings us closer to the solution.

Lists are order collections of items that can contain duplicates. Many typed languages require that all items are of a single type. This isn't always the case though, for languages like python, but it is a good rule of thumb to adhere to for clarity. Lists are often the default for inputs, and for storing things, but don't be lazy, run through the others in your mind before settling on a list. Often sets are a good alternative. Strings can often be thought of and operated on as character lists.

Sets are unordered collections of singular items. Adding an item that exists in the set is a no-op. Sets have many advantages, lookup time is big oh of one, or constant time. You can also use set operations, such as union, intersection, and subtraction. These have much better runtime than their alternative in lists, and simpler to write in code. Some languages don't have a set standard type. In these cases, you can use a dictionary pointing to a boolean. True if the value is in the set, and false if not. When deleting, you can delete the key or set the value to False.

Dictionaries are key-value stores based on the hash of the key. Each key must be of a hashable type, not necessarily a primative, but simple enough that the language you're using can distill it into them. Dictionaries can be used for intermediate counts when searching a larger problem space. They are also a good choice for game boards that are large, a checkers board that is sparsely populated is better represented by a D[(r,c)]=piece than a M[r][c]=piece, since the matrix would have to allocate the memory for the entire playspace, where the dictionary wouldn't need to do this.

Trees are unidirectional acyclic graphs. Each node points to its children, whether it's a binary tree or a generic tree. Most languages require the definition of the Node class for your specific purposes. It can be quite straightforward, with only a list of children necessary. An id can be optional. Traversing the tree in a depth first or breadth first search is most easily accomplished in this form of the data. This is often how it is inputted into your algorithm method as well. Here's the python class for a generic tree:

class Node:
    # note that the name of the class is `Node`, there is never a `Tree` class, the root is merely a Node that points to other Nodes.
    def __init__(self, children=[]):
        # each member in the list is of type Node
        self.children = children

Let's look at an example of a problem that takes advantage of massaging one datatype into another.

Given a company organization chart as a tree, each node is an employee with a unique name and a salary. Create a datastructure that allows a caller to do a constant time lookup of employees by salary bands in increments of $10k.

Here we are given a root node that points to the rest of the tree or org chart. We can use two dictionaries to most cleanly organize the data into the desired result. Traverse the tree and place the nodes into a dictionary keyed off of the employee name. Use a second dictionary keyed by the lower end of the salary band, and point to a set of employee names. It is possible to do this in a single datastructure, but I'd say separating makes for a cleaner separation of concerns, and may make the system more extensible. It would look something like this:

class Employee:
    def __init__(self, name, salary, reports=[]):
        self.name = name
        self.salary = salary
        self.reports = reports

class OrganizationBySalary:
    def __init__(self):
        self.employees = dict()
        self.salaryBands = dict()

    def addEmployee(employee):
        self.employees[employee.name] = employee
        floorSalary = floor(employee.salary / 10_000) * 10_000
        alist = self.salaryBands.get(floorSalary, list())
        alist.append(employee.name)
        self.salaryBands[floorSalary] = alist
    
    def loookupEmployees(salary):
        alist = self.salaryBands.get(salary, list())
        employees = list()
        for name in alist:
            employees.append(self.employees[name])
        return employees

There are an infinite number of composite datastructures you can make, similar to our example. Having a deep understanding of these primitives will arm you to make good decisions in how to model the data in your algorithm interviews, and help make you a more confident interviewee.