Spark NLP is the most widely used NLP library in the enterprise, thanks to implementing production-grade, trainable, & scalable versions of state-of-the-art deep learning & transfer learning NLP research. It is also Open Source with a permissive Apache 2.0 license that officially supports Python, Java, & Scala languages backed by a highly active community & JSL members.
Spark NLP library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking, multi-class & multi-label text classification, sentiment analysis, emotion detection, unsupervised keyword extraction, & state-of-the-art Transformers such as BERT, ELECTRA, ELMO, ALBERT, XLNet, & Universal Sentence Encoder.
The latest release of Spark NLP 3.0 comes with over 1100+ pretrained models, pipelines, & Transformers in 190+ different languages. It also delivers massive speeds up on both CPU & GPU devices while extending support for the latest computing platforms such as new Databricks runtimes & EMR versions.
The talk will focus on how to scale Apache Spark / PySpark applications in YARN clusters, use GPU in Databricks new Apache Spark 3.x runtimes, & manage large-scale datasets in resource-demanding NLP applications efficiently. We will share benchmarks, tips & tricks, & lessons learned when scaling Spark NLP.