Graph Processing Using Big Data Technologies

Published March 18, 2014   |   
Charles Menguy

Processing extremely large graphs has been and remains a challenge, but recent advances in Big Data technologies have made this task more practical. Tapad, a startup based in NYC focused on cross-device content delivery, has made graph processing the heart of their business model using Big Data to scale to terabytes of data.

Social networks like Facebook or Twitter contain data that naturally lends itself to a graph representation. But graphs can be used to represent less obvious data, as in the case of Tapad’s device graph. Dag Liodden, Tapad’s co-founder and CTO, describes why using a graph representation for devices makes sense:

Tapad takes a graph-oriented approach to modeling relationships between devices. Anonymous identifiers (such as cookie IDs) are represented as nodes in our Device Graph and we track marketing information to these nodes. Edges between the nodes are scored / weighted using a combination of deterministic data and probabilistic statistical models / machine learning techniques. The concept of a “device” is defined as a starting device / node (let’s say the cookie ID of a browser) and the collections of nodes (let’s say the cookie IDs of a Tablet and a Connected TV) that are reachable from that starting point given a customizable set of edge constraints. Having an actual graph structure, as opposed to just aggregated information into a single node, gives us the flexibility to balance accuracy and scale dynamically as well as more easily augment the graph with new edge inference models.

Read More