| |
|
|
DETAILS |
|
Talk #1: Protein Sequence Classification using ProtBert Model from Hugging Face Library by Mani Khanuja (https://www.linkedin.com/in/manikhanuja/)
The study of protein localization (location of protein in a cell) is important to comprehend the function of protein & has great importance for drug design & other applications. Therefore, we will talk about how we can leverage Natural Language Processing (NLP) techniques for protein sequence classification. The idea is to interpret protein sequences as sentences & their constituent amino acids as single words. It was first introduced in this research paper: ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning & High Performance Computing: https://www.biorxiv.org/content/10.1101/2020.07.12.199554v2.full
We will cover the following: * What is ProtBert? * Feature Engineering of Protein Sequence. * Fine-Tuning & deploying Pytorch ProtBert Model from Hugging Face library on Amazon SageMaker. * Leveraging Amazon SageMaker Distributed Data Parallel (SDP) feature during training.
GitHub link: https://github.com/aws-samples/amazon-sagemaker-protein-classification
|
|
|
|
|
|
|
|