dataset.txt contains the raw dataset. The images are from MS COCO. The format
of each line is:
  <img><tab><action><tab><motivation><tab><scene><tab><traintest>

skipthoughts.npz contains the skipthought embedding (4800 dim) of each token.
To load this in Python, just do: numpy.load("skipthoughts.npz")

clusters_K.npz contains our KMeans clustering of the skipthought vectors for a
few different values of K. 

language_model_100_256_100.npz contains scores from the language model. The order is
action, motivation, scene.
