Nuances of machine learning with Spark ML

Day 1 /  / Track 4  /  RU / For practicing engineers

What a Java programmer needs to be able to do and understand in a typical BigData + ML project:

  • how to choose features;
  • how to recode features;
  • how to scale;
  • how to clear and fill in the gaps;
  • how to evaluate the quality of clusterization;
  • what to do if one tree is not enough;
  • how to make cross-validation. And all of this in Scala + Spark!

All things listed above will be explained using the popular dataset with Kaggle as an example.

Go to presentation
Alexey Zinoviev
Alexey Zinoviev

Just as Charon from the Greek myths, Alexey helps people to get from one side to the other, the sides being Java and Big Data in his case. Or, in more simple words, he is a trainer at EPAM Systems. He works with Hadoop/Spark and other Big Data projects since 2012, forks such projects and sends pull requests since 2014, presents talks since 2015. His favourite areas are text data and big graphs.