Multinomial Logistic Regression with Apache Spark

Speaker: DB Tsai
Date: June 20, 2014
Location: Hacker Dojo, Mountain View, CA
Host: Silicon Valley Machine Learning Meetup
URL: http://www.meetup.com/Silicon-Valley-Machine-Learning/events/187398222/
Slide: http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297
Video: https://www.youtube.com/watch?v=rYwZ09b1B1c

Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it’s with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.

2014-06-20 Multinomial Logistic Regression with Apache Spark from DB Tsai

2014-06-23 BIG DATA · COMPUTER · HADOOP · MACHINE LEARNING · PROGRAMING
Algorithm Hadoop L-BFGS Machine Learning MLlib Multinomial Logistic Regression Optimization Spark

Dialogue & Discussion