Natural Language Processing
Learn how to work with natural language processing with Python using traditional machine learning methods. Then, deep dive into the realm of Sequential Models and state of the art language models.
Introduction to NLP
Natural language processing applies computational linguistics to build real-world applications that work with languages comprising varying structures. We try to teach the computer to learn languages, and then expect it to understand it, with suitable, efficient algorithms. This module will drive you through the introduction to NLP and all the essential concepts you need to know.
Preprocessing text data
Text preprocessing is the method to clean and prepare text data. This module will teach you all the steps involved in preprocessing a text like Text Cleansing, Tokenization, Stemming, etc.
Bag of Words Model
Bag of words is a Natural Language Processing technique of text modelling. In technical terms, we can say that it is a method of feature extraction with text data. This approach is a flexible and straightforward way of extracting features from documents. In this module, you will learn how to keep track of words, disregard the grammatical details, word order, etc.
TF is the term frequency (TF) of a word in a document. There are several ways of calculating this frequency, with the simplest being a raw count of instances a word appears in a document. IDF is the inverse document frequency(IDF) of the word across a set of documents. This suggests how common or rare a word is in the entire document set. The closer it is to 0, the more common is the word.
An N-gram is a series of N-words. They are broadly used in text mining and natural language processing tasks.
Word2vec is a method to create word embeddings by using a two-layer neural network efficiently. It was developed by Tomas Mikolov et al. at Google in 2013 to make the neural-network-based training of the embedding more efficient and since then has become the de facto standard for developing pre-trained word embedding.
GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm, which is an alternate method to create word embeddings. It is based on matrix factorisation techniques on the word-context matrix.
POS Tagging & Named Entity Recognition
We have learned the differences between the various parts of speech tags such as nouns, verbs, adjectives, and adverbs in elementary school. Associating each word in a sentence with a proper POS (part of speech) is known as POS tagging or POS annotation. POS tags are also known as word classes, morphological classes, or lexical tags. NER, short for, Named Entity Recognition is a standard Natural Language Processing problem which deals with information extraction. The primary objective is to locate and classify named entities in text into predefined categories such as the names of persons, organisations, locations, events, expressions of times, quantities, monetary values, percentages, etc.
Introduction to Sequential models
A sequence, as the name suggests, is an ordered collection of several items. In this module, you will learn how to predict what letter or word appears using the Sequential model in NLP.
Need for memory in neural networks
This module will teach you how critical is the need for memory in Neural Networks.
Types of sequential models – One to many, many to one, many to many
In this module, you will go through all the types of Sequential models like one-to-many, many-to-one, and many-to-many.
Recurrent Neural networks (RNNs)
An artificial neural network that uses sequential data or time-series data is known as a Recurrent Neural Network. It can be used for language translation, natural language processing (NLP), speech recognition, and image captioning.
Long Short Term Memory (LSTM)
LSTM is a type of Artificial Recurrent Neural Network that can learn order dependence in sequence prediction problems.
Great Recurrent Unit (GRU) is a gating mechanism in RNN. You will learn all you need to about the mechanism in this module.
Applications of LSTMs
You will go through all the significant applications of LSTM in this module.
Sentiment analysis using LSTM
An NLP technique to determine whether the data is positive, negative, or neutral is known as Sentiment Analysis. The most commonly used example is Twitter.
Time series analysis
Time-Series Analysis comprises methods for analysing data on time-series to extract meaningful statistics and other relevant information. Time-Series forecasting is used to predict future values based on previously observed values.
Neural Machine Translation
Neural Machine Translation (NMT) is a task for machine translation that uses an artificial neural network, which automatically converts source text in one language to the text in another language.
Advanced Language Models
This module will teach several other widely used and advanced language models used in NLP.