The technology is an integrated solution incorporating the retrievable personality for multimodal emotion sensing for spoken services. This framework provides a real-world flexibility that enables the estimation of the target speaker emotion states without manual personalization.