Multimodal Tools for Speech and Language Processing


NLP tools

·      Word embeddings: GloVe (, Word2Vec (, BERT (, ELMo (, RoBERTa

·      Stanford NLP software (

·      Unigrams, bigrams, trigrams

·      Linguistic Inquiry and Word Count (LIWC): (

·      POS tags: NLTK

·      Morphological analysis:

o   Polyglot:

o   Morfessor :

o   LegaliPy:

·      Flesch reading ease (Kincaid et al 1075)

·      Speciteller: Specificity score (Li and Nenkova 2015)

·      Concreteness score (Brysbaert et al 2014)

·      Dictionary of Affect (Whissell 1989)

·      Hedge words and phrases (Ulinski et al 2018)

·      textstat: tools to extract readability measures from text (readability, complexity, and grade level)

·      Tools to restore punctuation in unpunctuated text/ASR results:

o   Punctuator

o   Bert-restore-punctuation

o   fastPunct

o   Ottokart/punctuator2


Speech approaches

·      Aenaes: text/speech alignment (

·      MFCC features

·      Acoustic-prosodic features

o   OpenSMILE (

o   Parselmouth (

o   Praat (

o   Prosodic labeling and detection



o   Prosodic analysis:  AuToBI – A Tool for Automatic ToBI annotation (

o   Video series in speech acoustics:

·      ASR

o   Kaldi (

o   Google Cloud Speech-to-Text (

o   And more:

o   Basic information:

·      TTS

o   Simon King Merlin video tutorial:


·      Noise reduction: (

o   Calculating spectral centroids)

o   MFCCs

o   Median filtering

o   Spleeter

o   Denoising script (multiple methods included)

·      Old and new speech software:  

o   SoX conversion software:


·      Spectrogam reading practice:  




Visual features

·      Fisher Vector encoding (FV) (

·      Vector of Linearly Aggregated Descriptors (VLAD) (

·      Facial expression detection (FED) (


Statistical measures

·      Pearson’s correlation

·      Krippendorff’s alpha

·      Paired T-tests


Machine Learning

·      Weka

·      Scikit-learn (https:/


Some other potentially useful papers: