In order to do this, the method is to have a unique collection, using your XML as a data feed, and in the Template of that collection, have both a text, an Image Asset and a Video Asset, with every element bound as source for both Assets.
Then, using a binding and a custom converter between the filename and the visibility of each asset, only display the one that is relevant.
For instance, in your converter, if you detect your media file path from your feed ends up with “.jpg” it’s an image, so you show the Image Asset and hide the Video and Text ones. If the file path ends up with “.mp4”, show the Video one and hide the 2 others and so on.
We have an article explaining this for an Excel Data Feed here but it’s the same principle for an API feed as source.