Technical Name |
Numeral Understanding and Mining |
Project Operator |
National Taiwan University |
Project Host |
陳信希 |
Summary |
We propose numeral understanding and mining tasks with the expert-annotated datasets and the tailor-made methods. First, we present a taxonomy for the numerals in financial data and further solve the numeral attachment problem. Second, we find that investors always make a claim with an estimation, which is represented by numerals. Therefore, we design a novel numeral argument mining task and let model jointly learn with the numeral understanding task. Third, we leverage the findings of the numeral argument mining task to evaluate the opinion quality, which is important but seldom mentioned in previous works. Finally, we extend our work to false information detection and the applications in the financial domain. The findings of our work can be further adopted in other domains such as clinical records and geographical documents. |
Scientific Breakthrough |
Numerals provide rich and crucial information in documents in many domains. For example, in clinical records, one important piece of information is dosage, expressed by numerals; numerals provide ingredient proportions in recipes; in financial statements, numerals represent many meanings. However, most of previous works take a numeral as a word token but do not provide a fine-grained understanding of the information embedded in the numerals. To bridge this gap, we introduce several novel numeral understanding tasks and point out possible research directions on numeral understanding and mining. We mainly use the documents in the finance domain to explore the proposed tasks. The findings of our works can be used in other domains, which have lots of numerals in the narrative. |
Industrial Applicability |
Although the proposed numeral understanding and mining tasks are important, few previous works pay their attention to this issue. In our work, we adopt the financial textual data for developing the methods of numeral understanding and mining, and demonstrating the usefulness in the financial domain. Moreover, numerals also contain crucial information in other domains. We plan to apply our findings to other documents with plenty of numerals such as the cases in clinical medicine, recipes, and geographic documents. Our results won the first prize and the second prize in Jih Sun Securities and Microsoft FinTech Hackathon 2019 and 2018, respectively. These results evidence that the proposal tasks and methods can be applied to real-world applications in the financial industry. |
Keyword |
Numeral Understanding Numeral Mining Financial Opinion Mining Opinion Mining Financial Techonlogy Natural Language Processing Information Extraction Quality Evaluation Argument Mining False Information Detection |