SOFIISKI UNIVERSITET SVETI KLIMENT OHRIDSKI
Founded in 1889, Sofia University "St. Kliment Ohridski" is the oldest and the leading research and teaching university of Bulgaria. The University consists of 16 Faculties, 3 Departments and numerous scientific centers and laboratories. The academic staff of the University consists of 1800 lecturers, among them 600 full professors and associate professors. In the academic year 2010/2011 the University has a total of 25,000 students.
The Faculty of Mathematics and Informatics (FMI), which is involved in the CULTURA project, has its origins in the Department of Physics and Mathematics and was established in 1889. The faculty has 70 Professors and Associate Professors and over 80 Assistant Professors, many of whom have taught at renowned universities in Europe, USA, Canada, and have participated in different international scientific forums and symposia. There are currently over 2200 undergraduate and graduate students studying in the faculty. FMI is the Sofia University teaching and research center for Pure Mathematics, Applied Mathematics and Computer Science.
Contribution to CULTURA
Sofia University will bring its expertise to the CULTURA project, particularly in the areas of computational linguistics, mathematical logic, discrete mathematics and finite-state automata. The team at Sofia University has developed mathematically grounded approaches in the area of text correction and approximate search which were also efficiently implemented for real-life scenarios. This knowledge is crucial for the execution of CULTURA. Sofia University will bring IP to CULTURA in the form of advanced mathematical algorithms. Digital Humanities are not only a natural area where this experience can be applied to, but also a source of challenging problems which will ask for further improvement of the existing techniques and development of new and more efficient ones.
Members of FMI have participated in two EU projects relevant to CULTURA:
- OCoRrect aims at correction and normalisation of large content collections. The main objective of this project was to expand and develop methods for improving the OCR correction of multilingual documents. In order to achieve this it explored the concept of Levenschtein automata and extended it with arbitrary weights (probabilities) as to better reflect the specificities of the languages.
- IMPACT focuses on the better OCR and information retrieval from historical repositories. One of the main objectives of the project is to provide innovative language technologies to remove the historical language barrier induced by different spelling variations and lack of normalised writing rules.