You can do this in R using the quanteda package, which can detect multi-word expressions as statistical collocates, which would be the multi-word expressions that you are probably referring to in English. To remove the collocations containing stop words, you would first tokenise the text, then remove the stop words leaving a "pad" in place to prevent false adjacencies in the results (two words that were not actually adjacent before the removal of stop words between them). please follow the below link you can get clear idea.