Home

Under the Patronage of HE. Mr. Boutros Harb - The Minister of Telecommunications

The 2nd International Conference on Open Source Software Computing (OSSCOM 2016)

You are here

Data Mining with Spark

- Venue: Lebanese University (UL) - Beirut - Lebanon

- Date(s): 2nd December 2016

- Time: 09:00-13:00

- By: Dr. Pascal Fares

- Introduction: Spark is built on the concept of distributed datasets, which contain arbitrary Scala, Java or Python objects. You create a dataset from external data, then apply parallel operations to it. The building block of the Spark API is its RDD API. In the RDD API, there are two types of operations: transformations, which define a new dataset based on previous ones, and actions, which kick off a job to execute on a cluster. On top of Spark’s RDD API, high level APIs are provided, e.g. DataFrame API and Machine Learning API.

- Aims Of The Workshop: Install, configure and get an overview of Open Tools for big data, data manipulations and clustering. You'll learn among other how to install Linux, Java, Sacala, Spark on any intel or AMD PC then build a small "cluster". Will then use our cluster to demonstrate the map-reduce and machine learning pradigm by appling some use cases.

- Targeted Audience:

  • Installing open source tools and products
  • Leaning new programming paradigms in the field of big data and clustering using "Open Source" only tools

- Prerequisites: Knowledge of programming, Java is best

- Laptop Requirements: Laptop Requirements: At least each 2 must have a laptop, will install Linux on them

- Workshop Flyer: click here

- Registration: click here