Yelp data processing using Spark and Neo4j

In this project, we are going to do network analysis using a graph database so that we can find patterns in how a social network affects business reviews and ratings.
Event Date
May - 2017
07:30pm - 10:00pm PST
May - 2017
07:30pm - 10:00pm PST
What are the prerequisites for this project?
  • It is expected that students have a fair knowledge of Big Data and Hadoop particularly HDFS, Pig, Hive and Impala.
  • Installation Cloudera quickstart VM. Since we will be doing the development in the Quickstart VM, it is essential to have the Scala SDK installed there as well. Instruction on how to setup a Scala SDK and runtime can be found at here.
  • An installation of Neo4J on your host machine.
  • This project assumes that you have a good knowledge of Hadoop. If not - we recommend you to take the Big Data and Hadoop course first.

What will you learn

  • Introduce key terminologies in graph database
  • Short introduction to cypher
  • Spark-Neo4j connector
  • Introduction to Spark GraphX
  • Data analysis using GraphX and Neo4j

Project Description

Still on the series on "Data engineering using Yelp dataset", we have built our data warehouse to an appreciable stage and users can make any kind of query that they want to. Well done.

But not all queries are easy to read/write by users or not all queries are easy to execute by the query engine. Some queries carry so much self-joins that they either become inefficient for the system or too confusing for the writer.

So in this hackerday, we are going to be doing network analysis using a graph database. The purpose of this is to find patterns in how a social network affects business reviews and ratings. This on its own could be an outstanding data product from the yelp dataset.

We will be using the open source graph database Neo4J and Spark to analyze the social network of users and if it has any effect on how ratings or reviews were done.



Big Data & Enterprise Software Engineer

I am passionate about software development, databases, data analysis and the android platform. My native language is java but no one has stopped me so far from learning and using angular and node.js. Data and data analysis is thrilling and so are my experiences with SQL on Oracle, Microsoft SQL Server, Postgres and MyS see more...

What is Hackerday?

Stay updated in technology trends by working on projects

Live online coding sessions led by industry experts

Build 2-4 projects a month each lasting 6 hours designed to teach you advanced concepts

Code in groups and connect with your community