Synonyms Detection in Folk Tag Set: A Novel Hybrid Solution
Keywords:
Tags, Folksonomy, Synonyms, Collaborative tagging, Search, RetrievalAbstract
Collaborative tagging is one of the most important applications of web 2.0 that allow users to
associate tags (a free-form text chosen by the users) with the resource, which is metadata for
that resource. These tags are used later on for search and retrieval of these resources. One of
the issues in a folk tag set is ambiguity, as ambiguity causes incorrect resource(s) retrieval. To
bring precision in search, we need to remove this ambiguity. One of the reasons of ambiguity
is presence of synonyms in a tag set. In this work, we have proposed a novel solution for
synonyms detection. The proposed solution provides a concise tagset that will be associated
with the resource. The methodology of our approach can be defined in four major steps. First,
we have removed misspelled tags. In the second step, we have detected synonyms using
WordNet and Microsoft Word dictionaries. In the third step, we have used Euclidian distance
to find rest of the synonyms and finally, we obtained precise tag set without synonyms.
Dictionaries provide coverage to tags which are Standard English language words and
mathematical formula covers the tags which are from folk vocabulary and are not present in
the dictionaries. We have tested our approach on image resources with which tag set
composed of twenty tags is associated. We compared our results with five state-of-the art
techniques including cosine, Jaccard, projection, mutual information, and dice. We can
conclude that the results of our approach are more accurate in finding synonyms.
References
F. Jabeen, S. Khusro, A. Majid, and A. Rauf. 2016. Semantics discovery in social tagging systems, A review,
Multimed. Tools Appl,75(1), pp 573–605 .
S. Hayman and N. Lothian. 2007. Taxonomy Directed Folksonomies Integrating User Tagging And
Controlled Vocabularies For Australian Education Networks, in New Developments in Social Bookmarking, Ark
Group Conference: Developing and Improving Classification Schemes., pp. 1–27.
D. Lin, S. Zhao, and H. D. District, “Identifying synonyms among distributionally similar words,” 3(4), pp.
–1493, 2003.
H. Wu , M. Zhou. 2003. Optimizing Synonym Extraction Using Monolingual And Bilingual Resources,
Proceedings of the 2th international workshop on Paraphrasing. Association for Computational Linguistics, July
- 11, pp 72–79.
V. D. Blondel and P. Senellart. 2002. Automatic extraction of synonyms in a dictionary, SIAM Int. Conf.
data Min., pp 1–7.
R. Barzilay and K. R. McKeown. 2001. Extracting paraphrases from a parallel corpus, Proceedings of the
th Annual Meeting on Association for Computational Linguistics, July 06 - 11, pp 50–57.
J. R. Curran. 2002. Ensemble methods for automatic thesaurus extraction, Proceedings of the ACL-02
conference on Empirical methods in natural language processing, July, pp. 222–229,.
S. Lee and H. Yong. 2007.TagPlus : A Retrieval System using Synonym Tag in Folksonomy,pp 294-298.
D. Freitag et al.2005. New Experiments In Distributional Representations Of Synonymy, Proceedings of the
th Conference on Computational Natural Language Learning. Association for Computational Linguistics, June
- 30, pp 25–32.
M. Piasecki, S. Szpakowicz, and B. Broda. 2007. Extended Similarity Test For The Evaluation Of Semantic
Similarity Functions, Proceedings of the 3rd Language and Technology Conference. Poznań, Poland: Poznań,
Wydawnictwo Poznańskie Sp. z oo, pp 104–108.
E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, and M. Pas, “A Study on Similarity and Relatedness Using
Distributional and WordNet-based Approaches,” no. June, pp. 19–27, 2009.
B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. 2009. Evaluating Similarity
Measures For Emergent Semantics of Social Tagging, Proceedings of the WWW’ 09 18th international
conference on World Wide Web, April 20 - 24, pp 641–650.
H. Mousselly-sergieh et al.2013.Tag Similarity in Folksonomies, in Proceedings of the XXXI INFORSID
congress,May 29,pp 319-334.
T. K. Landauer and S. T. Dumais . 1997.A solution to plato ’ s Problem : the latent semantic analysis theory
of acquisition , induction , and representation of knowledge,1(2), pp 211–240.
P.Turney. 2001. Mining the Web for Synonyms : PMI-IR Versus LSA on TOEFL ,Proceedings of the 12th
European Conference on Machine Learning,Freiburg,Germany, September 5-7, pp 491–502.
E. Terra and C. L. A. Clarke. 2003. Frequency Estimates For Statistical Word Similarity Measures,
Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational
Linguistics on Human Language Technology, May 27 - June 01, pp 165–172.
G. Solskinnsbakk and J. A. Gulla. 2011. Mining Tag Similarity In Folksonomies, Proceedings of the 3rd
international workshop on Search and mining user-generated contents, October 28 - 28, pp 53–60.
G. Quattrone, E. Ferrara, P. De Meo, and L. Capra. 2011. Measuring Similarity In Large-Scale
Folksonomies, Proceedings of the 23rd International Conference on Software Engineering and Knowledge
Engineering ,25 jul, pp 385–391.
T. Mikolov, G. Corrado, K. Chen, and J. Dean. 2013.Vector Space,P 1–12.
C. Cattuto, D. Benz, A. Hotho, and G. Stumme. 2008. Semantic analysis of tag similarity measures in
collaborative tagging systems,Data Eng., 805(14), pp 1–5.
A. Rêgo12, L. Marinho, and C. E. Pires. 2012.Learning Synonym Relations From Folksonomies,IADIS Int.
Conf. WWW/Internet, pp 273–280.
D. Eynard, L. Mazzola, and A. Dattolo, 2013.Exploiting tag similarities to discover synonyms and
homonyms in folksonomies, vol.43,no.12,pp.1437–1457.