Big Data Processing, visualization and ML with Apache Spark and Zeppelin

Apache Zeppelin is an open-source project that uses the concept of interactive notebooks to do several things: Data Ingestion, Data Discovery, Data Analytics, Data Visualization & Collaboration It supports multiple “backends” for working with your data, including: Apache Spark, JDBC, and much more. With its Spark built-in interpreter, you can load JARs at runtime (either local or from a Maven repo) and code in Scala, Groovy, or Kotlin directly in the notebook. In this session, we’ll explore an example that loads in a data set and shows some visualizations. We’ll write some Scala code to massage and filter data, then show the results in our notebook. We’ll also apply machine learning to our data set and do some analysis and show some nice graphs to examine our data.