What a Java programmer needs to be able to do and understand in a typical BigData + ML project:
- How to choose features;
- How to recode features;
- How to scale;
- How to clear and fill in the gaps;
- How to evaluate the quality of clusterization and binary classification;
- What to do if your classification is suddenly not binary;
- How to make cross-validation.
And all of this in Java + Spark!
Besides, we'll talk about pitfalls you can be caught up by while using MLlib, the aspects of some popular algorithms realization, kick some open source rivals and discuss the peculiarities of integration in already existing applications.