The overall system first extracts embedded speaker identity features using a neural network model, then the deep neural network speech enhancement takes the augmented features as the input to generate the enhanced spectra. With the additional embedded features, the speech enhancement system can be guided to generate the optimal output corresponding to the speaker identity.