Multimodal Tools for Speech and Language Processing

 

NLP tools

·      Word embeddings: GloVe (https://nlp.stanford.edu/projects/glove/), Word2Vec (https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa), BERT (https://pypi.org/project/bert-embedding/), ELMo (https://allennlp.org/elmo), RoBERTa

·      Stanford NLP software (https://nlp.stanford.edu/software/)

·      Unigrams, bigrams, trigrams

·      Linguistic Inquiry and Word Count (LIWC): (https://repositories.lib.utexas.edu/bitstream/handle/2152/31333/LIWC2015_LanguageManual.pdf)

·      POS tags: NLTK

·      Morphological analysis:

o   Polyglot: https://polyglot.readthedocs.io/en/latest/MorphologicalAnalysis.html

o   Morfessor : https://morfessor.readthedocs.io/en/latest/

o   LegaliPy: http://syllabipy.com

·      Flesch reading ease (Kincaid et al 1075)

·      Speciteller: Specificity score (Li and Nenkova 2015)

·      Concreteness score (Brysbaert et al 2014)

·      Dictionary of Affect (Whissell 1989)

·      Hedge words and phrases (Ulinski et al 2018)

·      textstat: tools to extract readability measures from text (readability, complexity, and grade level)

·      Tools to restore punctuation in unpunctuated text/ASR results:

o   Punctuator

o   Bert-restore-punctuation

o   fastPunct

o   Ottokart/punctuator2

 

Speech approaches

·      Aenaes: text/speech alignment (https://www.readbeyond.it/aeneas/)

·      MFCC features

·      Acoustic-prosodic features

o   OpenSMILE (https://www.audeering.com/opensmile/)

o   Parselmouth (https://parselmouth.readthedocs.io/en/stable/)

o   Praat (https://www.fon.hum.uva.nl/praat/)

o   Prosodic labeling and detection

o   http://www.speech.cs.cmu.edu/tobi/

o   https://www.ling.ohio-state.edu/research/phonetics/E_ToBI/

o   Prosodic analysis:  AuToBI – A Tool for Automatic ToBI annotation (https://github.com/AndrewRosenberg/AuToBI)

o   Video series in speech acoustics:

·      ASR

o   Kaldi (https://github.com/kaldi-asr/kaldi)

o   Google Cloud Speech-to-Text (https://cloud.google.com/speech-to-text)

o   And more: https://www.goodfirms.co/blog/best-free-open-source-speech-recognition-software

o   Basic information: https://cmusphinx.github.io/wiki/tutorialconcepts/

·      TTS

o   Simon King Merlin video tutorial:  http://www.speech.zone/courses/one-off/merlin-interspeech2017/

o   http://www.cs.cmu.edu/~awb/synthesizers.html

·      Noise reduction: (https://dl.acm.org/doi/10.1145/2964284.2967306

o   Calculating spectral centroids)

o   MFCCs

o   Median filtering

o   Spleeter

o   Denoising script (multiple methods included)

·      Old and new speech software:  

o   SoX conversion software: http://sox.sourceforge.net

o   http://linux-sound.org/speech.html

·      Spectrogam reading practice:  

o   https://home.cc.umanitoba.ca/~robh/howto.html

o   https://linguistics.ucla.edu/people/hayes/103/SpectrogramReading/index.htm

 

Visual features

·      Fisher Vector encoding (FV) (https://papers.nips.cc/paper/1998/file/db1915052d15f7815c8b88e879465a1e-Paper.pdf)

·      Vector of Linearly Aggregated Descriptors (VLAD) (https://lear.inrialpes.fr/pubs/2010/JDSP10/jegou_compactimagerepresentation.pdf)

·      Facial expression detection (FED) (https://www.jstor.org/stable/30204706?seq=1#metadata_info_tab_contents)

 

Statistical measures

·      Pearson’s correlation

·      Krippendorff’s alpha

·      Paired T-tests

 

Machine Learning

·      Weka

·      Scikit-learn (https:/scikit-learn.org/stable/)

 

Some other potentially useful papers:

 

https://www.aclweb.org/anthology/W16-0301.pdf

 

https://www.aclweb.org/anthology/W17-3101.pdf

 

http://www.cs.columbia.edu/speech/PaperFiles/2019/clpsych19.pdf

 

http://www.cs.columbia.edu/speech/PaperFiles/2010/Hirschberg_etal2010.pdf