Fundamentals of Natural Language Processing

Fundamentals of Natural Language Processing

Instructor: James Martin

Access provided by Coursera Learning Team

4 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

23 hours to complete

3 weeks at 7 hours a week

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

23 hours to complete

3 weeks at 7 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Analyze corpora to develop effective lexicons using subword tokenization.
Develop language models that can assign probabilities to texts.
Design, implement, and evaluate the effectiveness of text classifiers using gradient-based learning techniques.
Design, implement and evaluate unsupervised methods for learning word embeddings.

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

4 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 4 modules in this course

The field of natural language processing (NLP) aims at getting computers to perform useful and interesting tasks with human language. This course introduces students to the 3 pillars underlying modern NLP: probabilistic language models, simple neural networks with a focus on gradient based learning, and vector-based meaning representations in the form of word embeddings. At the end of the course, students will be able to implement and analyze probabilistic language models based on N-grams, text classifiers using logistic regression and gradient-based learning, and vector-based approaches to word meaning and text classification.

This course can be taken for academic credit as part of CU Boulder’s MS in Data Science or MS in Computer Science degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more: MS in Data Science: https://discountnewyear.top/degrees/master-of-science-data-science-boulder MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder

This first week of Fundamentals of Natural Language Processing introduces the fundamental concepts of natural language processing (NLP), focusing on how computers process and analyze human language. You will explore key linguistic structures, including words and morphology, and learn essential techniques for text normalization and tokenization.

What's included

5 videos5 readings1 assignment

5 videosTotal 56 minutes

Meet Your Instructor1 minutePreview module
Course Introduction6 minutes
Morphology16 minutes
Text Normalization17 minutes
Subword Tokenization14 minutes

5 readingsTotal 125 minutes

Earn Academic Credit for Your Work! 10 minutes
Course Support10 minutes
Morphology30 minutes
Text Normalization60 minutes
Byte-Pair Encoding15 minutes

1 assignmentTotal 30 minutes

Quiz 1: Morphology and Tokenization30 minutes

This week explores foundational language modeling techniques, focusing on n-gram models and their role in statistical Natural Language Processing. You will learn how n-gram language models are constructed, smoothed, and evaluated for effectiveness.

What's included

4 videos4 readings1 assignment1 programming assignment

4 videosTotal 60 minutes

Introducing Language Models14 minutesPreview module
N-Gram Based Language Models15 minutes
Smoothing N-Gram Language Models21 minutes
Evaluating Language Models8 minutes

4 readingsTotal 80 minutes

N-Gram Language Models: Introduction10 minutes
N-Gram Language Models: N-Grams20 minutes
N-Gram Language Models: Smoothing, Interpolation, and Backoff20 minutes
Evaluating Language Models30 minutes

1 assignmentTotal 30 minutes

Quiz 2: Language Models30 minutes

1 programming assignmentTotal 180 minutes

Constructing a Language Model180 minutes

This week introduces text classification and explores logistic regression as a powerful classification technique. You will learn how logistic regression models work, including key mathematical concepts such as the logit function, gradients, and stochastic gradient descent. The week also covers evaluation metrics for assessing classifier performance.

What's included

6 videos3 readings1 assignment1 programming assignment

6 videosTotal 90 minutes

Introduction to Text Classification10 minutesPreview module
Logistic Regression16 minutes
Introducing the Logit7 minutes
Learning in Logistic Regression19 minutes
Learning Algorithms for Logistic Regression16 minutes
Evaluating Classifiers20 minutes

3 readingsTotal 125 minutes

Introduction to Text Classification35 minutes
Logistic Regression60 minutes
Evaluating Classifiers30 minutes

1 assignmentTotal 30 minutes

Quiz 3: Logistic Regression 30 minutes

1 programming assignmentTotal 180 minutes

Sentiment Classification with Logistic Regression180 minutes

This final week explores how words can be represented as vectors in a high-dimensional space, allowing computational models to capture semantic relationships between words. You will learn about both sparse and dense vector representations, including TF-IDF, Pointwise Mutual Information (PMI), Latent Semantic Analysis (LSA), and Word2Vec. The module also covers techniques for evaluating and applying word embeddings.