The RL model develops a control policy directly from experience to predict statesrewards during a learning procedure. Hence, we designed a medical image environment including US images, different actions,rewards, agent learns in this environment to extract the ALN regionevaluates the status. The performance of our proposed method achieves an accuracy of 83.6, a sensitivity of 88.6,a specificity of 89.0.