"For video anomaly detection, we apply pretrained models to obtain the foregroundthe optical flow as ground truth. Then our model estimates the information by taking only a single frame as input. For human behaviors, we take the human poses as inputuse a GCN-based model to predict the future poses. Both the anomaly scores of these two works are given by the error of the estimation.
For defect detection, our model takes patches of the image as inputlearns to extract features. The anomaly score of each patch is given by the distance between the patchall the training patches."