{br} STUCK with your assignment? {br} When is it due? {br} Get FREE assistance. Page Title: {title}{br} Page URL: {url}
+1 917 8105386 [email protected]

Theory

  1. Linear Algebra: Provide answers (if invalid, say so) to the following operations.
    (a) 
    3 5
    7 0

1 2
(b) 
3 5
7 0 1
2


  1. Probability: The entropy of a discrete random variable X is defined as:

    X
    x∈X
    P(x) ln P(x) (1)
    (a) Compute the entropy of the distribution P(x) = Multinomial([0.5, 0.4, 0.1]).
    (b) Compute the entropy of the uniform distribution P(x) = 1
    m ∀x ∈ [1, m].
    (c) Plot the entropy of the Bernoulli distribution: H(X) = −φ ln(φ) − (1 − φ) ln(1 − φ), with φ (i.e., the
    probability of drawing a 1) ranging from 0 to 1 on the x-axis. Where does its entropy peak? Where
    is its minimum? Why do you think this is?
  2. Naive Bayes: Given the following documents with counts for the sentiment vocabulary V ={good, poor,
    great, terrible}, and positive or negative class labels:
    Document “good” “poor” “great” “terrible” Class Label
    D1 1 1 2 0 +
    D2 0 0 0 1 –
    D3 1 2 0 1 –
    D4 1 3 2 2 –
    D5 3 0 2 0 +
    D6 0 0 0 1 –
    1This assignment is adapted from course material by Drs. Chen and Narasimhan at Princeton University.
    (a) Provide all the relevant probabilities for a naive Bayes model with add-1 smoothing, and assign a
    class label to the following test sentence: great characters and good acting, but terrible plot
    (b) Often whether or not a word occurs is more important than the frequency. A variant of naive Bayes,
    called binarized naive Bayes, clips the word counts in each document at 1. Compute a binarized
    naive Bayes model with add-1 smoothing on the same documents and predict the label of the same
    test sentence as before. Do the two models agree? Which model do you prefer, and why?
    Programming
  3. GitHub Classroom
    Sign up for GitHub if you don’t already have an account by going to github.com. We will use GitHub
    Classroom to submit assignments. Go to this link to create your own private repository and to import
    the data I have provided for you: https://classroom.github.com/a/iRbehYmE
  4. Language Models—lm.py
    Edit the Python 3 script named lm.py to train a language model on the provided Brown corpus.
    (a) First, process the training data by tokenizing into words. The simplest approach is to split strings
    on spaces. However, tokens are typically split into sub-word units when there is punctuation. For
    example, “Kellogg’s cereal” will get tokenized into “Kellogg,” “’s,” and “cereal.” As an optional
    extension, explore the nltk or spacy toolkits for tokenizing text.
    (b) You will need to keep track of the vocabulary (i.e., all tokens in your training data). Make a plot
    of the frequencies of each word (frequency on the y-axis), ordered by most frequent word to second
    most frequent, and so on (words on the x-axis). What pattern do you see? Does it follow Zipf’s law?
    (c) Compute bigram probabilities with add-α smoothing on the training data, and use these to calculate
    the perplexity on the training and validation datasets. Plot the training and validation perplexities
    as a function of α for values of 10−5
    , 10−4
    , 10−3
    , 10−2
    , 10−1
    , 1, and 10. What do you notice? Hint:
    You should see values of roughly 136 for training perplexity and 530 for validation perplexity when
    α = 0.01, and around 70 for training perplexity and 880 for validation perplexity when α = 0.0001.
Our customer support team is here to answer your questions. Ask us anything!
WeCreativez WhatsApp Support
Support Supervisor
Brian
Available