CoCLR is a novel self-supervised learning method for video representation. It exploits visual-only data to co-train video representation models using InfoNCE objective and MoCo on videos. This method addresses the need to process large amounts of unlabeled video data effectively, making it valuable for applications where labeled data is scarce or unavailable.