The vision.PointTracker does, as the name implies, tracks points (with KLT-algorithm).
But to track those points you first have to find them. This is mostly done with a object detector for example. Now what you are aiming for is to find a specific object in each frame and label it. If its something simple like people or faces you could try a already trained cascade object detector (MATLAB has some pre-trained variants of it).
If you want to detect a more special object you are better off if you
a.) write a algorithm for detection on your own (if you only want to detect white colored objects for example you can search for only white pixels, or if you want to detect a cube you can search for it with edge detection)
b.) label it yourself. Depending on how many pictures you have for training it might be faster to label them yourself instead of writing the algorithm and then check each picture if its labeled correctly.
The vision.PointTracker itselfs cant detect anything. It needs points you give it which it can then track.
Now if you are able to find your first object and get enough points, then the point tracker can follow those points over multiple images. So basically yes, you can use a point tracker to follow points over multiple images. But you have to make sure that you give it enough points to follow (id recommend 10 or more) and after you have processes all images you still should check if the labeling is done correctly.