Research & Development – Social Semantics
The unit of Research & Development is working on “BS2R – Beyond Social Semantic Recommendation”, a PIA, POR FESR 2013 project based on a “Social Semantic Search” platform to which Consulthink, CRS4 and the “Dipartimento di Ingegneria Elettrica ed Elettronica” (Department of Electrical Engineering and Electronics) of the University of Cagliari have been working since the spring of 2016 and whose end is scheduled for April 2018.
BS2R platform represents an “intelligent” search engine that analyzes the behaviors and the web searches made by “authoritative” and “trusted” users and anticipates their possible interests by suggesting them “potentially useful” contents, helping them to orientate themselves among the myriad of obtainable results. Who, during the research, has actually found that a content is interesting, suggests it, explicitly or implicitly, to his own contacts. The latter is going to be “directed” towards such information by viewing it in the first positions of SERP (Search Engine Results Page) pages.
At the basis of social interaction facilitated by the platform, there are mechanisms of TrustRank that measure the relationships between users and resources and the evolution of the social correlation engine AVIC (project resulting from the work on the announcement financed POR SARDEGNA 2000 – 2006 MISURA 3.13) that considers and studies the issues related to Social Networking and integrates itself with open source Knowledge Management systems to manage business knowledge in an unified way. Besides, the platform exploits social and semantic technologies to allow correlations between documents and users.
The project has remarkable characteristics of uniqueness and originality and its purpose is the creation of a platform that allows to formalize explicit knowledge and to enhance the implicit one, that knowledge which is difficult to encode, store and transmit, but which is a personal patrimony of experience – working or not – extremely important and useful. Therefore, the project intends to be able to catalog and manage the huge and growing amount of data that companies daily produce and improve internal business communication. To be precise, it is not just a matter of storing data, but of creating a system that can guarantee its authoritativeness, quality and reliability through social-semantic algorithms.
BS2R intends to overcome problems and limits related to the use of social networks, such as access to data under the control of third party and cultural hesitations associated with the use of social networks in the business. The platform is developed for the corporate reality, not for a non-global one. This means that employees do not need to sign up for Facebook, LinkedIn, Twitter or other social networks.
The work is distributed between Consulthink which deals with the development of the platform, the Department of Electrical Engineering and Electronics of the University of Cagliari which deals with image semantics (for the development of techniques useful for analyzing visual content in order to be relevant to the social interactions of the network) and the CRS4 that deals with the semantics of textual content developing tools of Computational Linguistics, a discipline that with automatic tools studies human language and defines algorithms to extract meaning from the texts analyzed.
In the picture, BS2R logo.
On 6 August 1991 at CERN in Geneva, the research group of Tim Berners-Lee created the first website written in HTML, a language that is fundamental for the web even now. In the same period the first browser “WorldWideWeb” (WWW) was created and the network took its name from it.
In 1999 there were a million of websites, nowadays there are more than a billion and the number of pages and content grows exponentially, for this reason search engines focused on how access quickly the content. For a long time the challenge between the major players in the sector was played on the ground of indexing, that is on the ability to index the largest portion of the accessible web, but with an exponential increase in data, traditional search engines are increasingly facing problems to relate with the research based on keywords because these fail to return to the user the pages that have relevance to the research carried out, point on which currently search engines are focusing considering also the semantic web.
From web 2.0 to web 3.0
Web 2.0 is characterized by interaction, participation and sharing, actions made easier by social network sites, of the web 3.0, however, it is more difficult to give a definition because it is a phenomenon that is still emerging. In any case, web 3.0 aims to turn the web into a database, to make the most of technologies based on artificial intelligence, to give space to the enhanced web – a web capable of influencing reality – to the 3D web that transforms many web spaces in 3D and to the semantic web. In fact, semantics states that words do not have an absolute meaning, but that this depends on the context, for this reason research that considers the keywords in the document, their context, hyperlinks, images and much more will be possible.
From AVIC to AVIC 2.0
BS2R uses and strengthens AVIC (Virtual Assistant Intelligent Collaborative) platform, already financed by “Sardegna Ricerche” in 2008. AVIC is an integrated platform with a search engine that works with a recommendation system and is promoted by INAIL within its own intranet portal. AVIC implements social re-ranking algorithms, trusting algorithms between users and implicit and explicit feedbacks. AVIC reorganizes the results obtained by a search engine using feedbacks (implicit and explicit) received by users who are connected each other through a social network during their normal web browsing. However, AVIC has some limits that BS2R resolved. AVIC pointed out, in fact, that open source systems are extremely slow in retrieving information making AVIC inadequate for a real system. In fact, BS2R system needs to record a large amount of semantic data that depends on the number of users of the social network and the number of interactions they have between them and with the resources used (especially documents).
Moreover, in the evolution of AVIC, two users who do not know each other, but who share the same interests, can meet each other.
BS2R is designed for all those realities where the explicit and/or implicit user experience, can considerably improve the identification of the results provided by any research carried out on a set of content which is considered interesting within the reality involved. The ideal habitat, therefore, is in companies or in the public administration, a reality in which traditional tools for knowledge management such as document systems and indexing engines are already present.
Large companies, public bodies, local and central public administrations see that every day the bulk of information increases in their systems and they begin to understand that to distribute successful services to citizens it is essential that they easily find the information they are looking for both because they do not have the time to refine their research and because they could signal the information they believe it is useful for the next visitors of the site.
Trust & Correlation Engine: Columbus’s egg
The relationship between users and between users and content is measured with TrustRank mechanisms that work thanks to the correlation engine used by the platform. A user’s access to a resource is uniquely identified by a URI (a sequence of characters that identifies a generic resource) that is saved in the system. When the user accesses a resource, the system recovers all the contacts of the social network of the user who have visited the same resource and determines the degree of trust between the two according to the feedbacks detected. More visits in common between similar resources have two users, more the degree of reliability and mutual trust improves.
To determine the trust among users, in addition to analyzing the common resources, BS2R analyzes also the similar that can be based on a statistical analysis of the text, on the analysis of the characteristics of low-level visual content such as color, texture and highlights in the text and semantic similitude that requires the definition of multiple domain ontologies.
Social relationships and useful relationships
Many Social Networking platforms have a problem with relationships. In real life, distinct social groups – friends, family, colleagues – in most cases do not come into contact, in social networking platforms this distinction does not exist and different social groups get in touch. In this way aspects related to relationships are easier to manage. However, this non-distinction can lead to incorrect suggestions in the research phase (false positive). False positive can be documents that do not have a correlation with the search query, but also a resource that, compared to the level of user competence, has an excessively complex information content. BS2R aims to overcome these limitations with semantic web tools that will allow to model relationships like: “A trusts B in the search for resources that deal with the topic C”.
Collaborative semantic web
To retrieve documents, traditional search engines use keywords and statistical analysis. Semantic research aims to define ontologies that are used both in the research phase, in order to disambiguate the user’s query, and during the indexing phase, in order to mark which are the semantic concepts in the text. The scientific literature presents numerous solutions and techniques consolidated in this field, but their diffusion is still considerably limited as they are difficult to apply. The semantic web is characterized by the definition of complex ontologies and their application with automatic techniques, BS2R, instead, proposes a semi-automatic approach that sees the active collaboration of users, one of the objectives of web 3.0.
Consulthink partners for BS2R: CRS4
CRS4 (“Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna”) is an interdisciplinary research center founded in 1990 that promotes the study, development and application of innovative solutions to problems coming from natural, social and industrial systems. It is located in the “Parco Scientifico e Tecnologico” (Polaris), 40 km from Cagliari. The center studies qualifying computational technologies applied to the fields of biomedicine, biotechnology, information society, energy and the environment.
Multidisciplinary approach, highly specialized skills and knowledge, agreements with academic, entrepreneurial and scientific world, participation in important national and international projects, these are the keys to the development and application of innovative solutions on which the center focuses.
The Center is focused on 4 thematic areas: Biomedicine, Data Fusion, Energy and the Environment and Information Society.
The goal of CRS4, therefore, is innovation and its mission is to help Sardinia to create and grow a fabric of hi-tech companies which is essential for its economic and cultural development.
Consulthink partners for BS2R: Dipartimento di Ingegneria Elettrica ed Elettronica (DIEE) University of Cagliari
The Department of Electrical and Electronic Engineering was founded in 1995, inheriting the Institute of Electrical Engineering founded in 1945. It includes 52 permanent members among professors, researchers and technical and administrative staff and over 100 collaborators among doctoral students, holders of research grants and contracts for self-employment.
The educational courses on offer consists of two three-year bachelor’s degree (electrical and electronic engineering and biomedical engineering) and four master’s degree programs (electrical, electronic, energy and telecommunications). In addition, two PhD courses belong to the DIEE: the PhD program in electronic and IT engineering and the PhD program in Industrial Engineering.
DIEE can count on close links of cooperation with various research centers, both public and private and with other academic institutions in Italy and abroad.
The research activities carried out by DIEE take place within projects financed by the European Union, the Region of Sardinia and collaboration agreements with companies in the area. The sectors studied are many and the results achieved by research are testified annually by numerous publications in international journals.
Social Networking, Natural Language Processing and Voice Recognition
Semantic interpretation of texts.
Study of specific tools, elaboration of algorithms based on the syntactic and semantic analysis of natural language with the aim of improving the performance of semantic disambiguation tools and interpreting the commands in natural languages, entered through a special vocal interface.
Analysis of the results achieved by recommendation tools based on user profiling. Description of the functionalities necessary to increase the relevance of the results coming from the search queries executed by the users of the system.
SardaNet: thanks to specially created automatic tools, it is possible to link the information obtained from the analysis of terms and their meanings to the pre-existing structure of WordNet. Development of SardaNet that allows to apply the tools of the Toolkit to the existing documentary heritage in the Sardinian language.
Multimedia Semantic Engine
Thanks to the techniques of image retrieval, object detection, object classification and similar, the most suitable visual characteristics for the representation of images, videos and other multimedia contents are going to be identified considering factors such as the extraction timing, the different semantic concepts to be detected, the characteristics and the integrability in the system.
Construction of a visual vocabulary of semantic concepts in order to describe a scene globally. Furthermore, the possibility of characterizing a scene through the presence or absence within it of more elementary semantic concepts such as objects, image attributes and similar is studied.
The developed machine learning techniques deal with the detection of objects that can refer to elementary concepts.
Development of techniques that measure the relevance of a multimedia content respect to a given semantic concept and integrate it with social analysis.