Let me take a shot at the customer churn example.
So your training/test examples would have feature values that are collected at a timestamp that precedes the label of churn/not churn. For example, you would collect the purchases, contacts, active time etc of all users till 2019–02–22, and then on 2019–02–23 assign each user a label of churned/not churned depending on whether they were active on 2019–02–23 or not. The model is a function that takes as input the feature values till 2019–02–22 and outputs a prediction to match the observations on 2019–02–23.
You can now use the model to make some business decisions. For example, you want to give $25 coupons to retain your at risk users. You obviously want to be as frugal as possible while giving out these coupons. So now sometime in the future, say 2019–04–01, you look at the purchases, contacts, active time etc of all your users, run the model on it, and get a prediction. This prediction now represents your estimate of whether on 2019–04–02 the user will churn or not, due to the way the model training was set up. You can now use this prediction to use your coupon budget on the highest at risk users.
So there is an implicit causal relationship between the model inputs and the model’s output prediction, due to the way the training example was constructed. (There are fancier modeling techniques that try to enforce this causal relationship explicitly. ) Constructing your training example correctly so that it represents a useful real life use case for prediction is part of building the model, and often tricky.