Siri basically works with two main technologies - Voice Recognition and NLP(Natural Language Processing) integrated with of course Machine Learning.
Speech recognition converts human voice into its corresponding textual form. This is a very challenging task as speech accents vary from country to country and even from state to state, the speed at which people talk vary drastically. Developers at Apple have used huge datasets with different variations in speech and voice to train the system.
Once the speech is converted to its textual form, NLP algorithms are run to understand the actual meaning of what the text, i.e. the intent of those generated texts.
Let me explain with a better example when we say "Hey Siri", in back-end speech recognition algorithms are activated and starts converting this speech into its textual form. Once it's converted to text "Hey Siri", NLP tries to understand the intent of the text. NLP is the feature that allows the system to understand variations of a sentence. Suppose we want Siri to set an alarm for 8AM tomorrow, it can be said in multiple ways:
Wake me up tomorrow 8AM
Set an alarm for 8AM tomorrow
Set an alarm tomorrow for 8AM
Wake me up at 8 in the morning
Last but not least, Of course, Machine learning is used to train the system such that it learns but itself. I'm sure we've used Siri and Google assistant in our daily lives, next you use them just think of all the processing(Technology Magic) going on in the backend.