{br} STUCK with your assignment? {br} When is it due? {br} Get FREE assistance. Page Title: {title}{br} Page URL: {url}

Linear Algebra

Theory

Linear Algebra: Provide answers (if invalid, say so) to the following operations.
(a)
3 5
7 0

1 2
(b)
3 5
7 0 1
2

Probability: The entropy of a discrete random variable X is defined as:
−
X
x∈X
P(x) ln P(x) (1)
(a) Compute the entropy of the distribution P(x) = Multinomial([0.5, 0.4, 0.1]).
(b) Compute the entropy of the uniform distribution P(x) = 1
m ∀x ∈ [1, m].
(c) Plot the entropy of the Bernoulli distribution: H(X) = −φ ln(φ) − (1 − φ) ln(1 − φ), with φ (i.e., the
probability of drawing a 1) ranging from 0 to 1 on the x-axis. Where does its entropy peak? Where
is its minimum? Why do you think this is?
Naive Bayes: Given the following documents with counts for the sentiment vocabulary V ={good, poor,
great, terrible}, and positive or negative class labels:
Document “good” “poor” “great” “terrible” Class Label
D1 1 1 2 0 +
D2 0 0 0 1 –
D3 1 2 0 1 –
D4 1 3 2 2 –
D5 3 0 2 0 +
D6 0 0 0 1 –
1This assignment is adapted from course material by Drs. Chen and Narasimhan at Princeton University.
(a) Provide all the relevant probabilities for a naive Bayes model with add-1 smoothing, and assign a
class label to the following test sentence: great characters and good acting, but terrible plot
(b) Often whether or not a word occurs is more important than the frequency. A variant of naive Bayes,
called binarized naive Bayes, clips the word counts in each document at 1. Compute a binarized
naive Bayes model with add-1 smoothing on the same documents and predict the label of the same
test sentence as before. Do the two models agree? Which model do you prefer, and why?
Programming
GitHub Classroom
Sign up for GitHub if you don’t already have an account by going to github.com. We will use GitHub
Classroom to submit assignments. Go to this link to create your own private repository and to import
the data I have provided for you: https://classroom.github.com/a/iRbehYmE
Language Models—lm.py
Edit the Python 3 script named lm.py to train a language model on the provided Brown corpus.
(a) First, process the training data by tokenizing into words. The simplest approach is to split strings
on spaces. However, tokens are typically split into sub-word units when there is punctuation. For
example, “Kellogg’s cereal” will get tokenized into “Kellogg,” “’s,” and “cereal.” As an optional
extension, explore the nltk or spacy toolkits for tokenizing text.
(b) You will need to keep track of the vocabulary (i.e., all tokens in your training data). Make a plot
of the frequencies of each word (frequency on the y-axis), ordered by most frequent word to second
most frequent, and so on (words on the x-axis). What pattern do you see? Does it follow Zipf’s law?
(c) Compute bigram probabilities with add-α smoothing on the training data, and use these to calculate
the perplexity on the training and validation datasets. Plot the training and validation perplexities
as a function of α for values of 10−5
, 10−4
, 10−3
, 10−2
, 10−1
, 1, and 10. What do you notice? Hint:
You should see values of roughly 136 for training perplexity and 530 for validation perplexity when
α = 0.01, and around 70 for training perplexity and 880 for validation perplexity when α = 0.0001.