Abstract The DETR series networks based on the Transformer architecture keep pushing the boundaries of object detection accuracy and speed in computer vision. However, non-cooperative object detection applications based on infrared images face challenges because of environmental complexity and poor image quality. To solve this problem, a novel object detection algorithm with high detection accuracy was proposed in this study, utilizing the Deformable DETR as the baseline. Initially, an image enhancement module called CLAHE-GB was designed to enhance the image process on infrared images, and it was effectively integrated with Deformable DETR. Subsequently, the algorithm was pre-trained on a large-scale general dataset. Then, data augmentation and transfer learning methods were developed to retrain the parameters of the detection head network using a self-made dataset of small infrared images of aerial objects. Finally, a comprehensive result analysis was conducted. The results show that the proposed algorithm can successfully achieve promising image enhancement effects and detection accuracy on infrared image data.