Forgan, a corpus of The Quran

Forghan, a corpus of The Quran

Introduction

Emam Ali: “The Quran is interpretation of ages”
From Quotes like these, we can understand why Allameh Tabatabaei believes we need an interpretation of Quran for every decade and it is obvious to this end, we are going to need the use of contemporary technology. With this attitude, The Quran mining research network is formed in the last decade so that by using the technology of text mining, discovering Quran’s hidden information could benefit from computer’s Artificial intelligence advanced tools. Researchers of this field try to not only to solve the question and unsolved problems related to the Quran but also to discover hidden aspects of humanity’s best source of guidance and present it to the world. Therefore, to this end, implementing the Quran in RDF format, including its syntax and semantic information allows for smart mining techniques to be able to be used.
To have a grasp of the Quran’s linguistic wonders, we need to discover its hidden meanings. Thus, after building a proper platform to use text mining on the Quran, we can use text mining tools to discover hidden semantic meanings of the Quran from its text. Text mining as the newest cooperative field of study between IT, linguistics and literature in mining the text of documents tries to reach this goal and we need to have labeled corpuses of those documents that contain a digital version of syntax and semantic information of those documents.
The textual corpus and infrastructure known as The Forghan corpus produced for The Quran is the result of an intelligent process designed and implemented in WTlab in Ferdowsi University of Mashhad. This corpus includes more than 587 megabytes of information, containing all of the Quran information, statistical information, Persian and English translations, syntax and semantic labels of Arabic, Persian and English texts, stem of words and many other information in RDF format providing the ability to search and mine for each verse of the Quran.
With information of surahs, verses, pages, syntax and alphabet characters generated in RDF format, a wide range of useful information is available for researchers to apply text mining to it. Using text mining on a textual labeled corpus of the Quran and developing a comprehensive ontology of concepts that are inside it, in future steps it is possible to achieve an explanation for its hidden meanings and layers.
It is worth mentioning that all the concepts in the corpus are linked to ontologies and related concepts in the web so that currently it includes more than 332589 links and 33854 of them are unique. there are more than 13298 RDF in the prepare corpus having the size of almost 587 Megabytes. Also, there are 13299 HTML files to represent RDF information.
Currently, the tool to parse syntax information of verses has been designed and the work of designing a tool to use SPARQL queries on RDF data is ongoing. We can mention followings for future works to enrich the output and also generating knowledge of the existing corpus:
- To highlight the main concept if each surah or analyzing the concept of verses inside a surah
- To construct an ontology of concepts of the Quran
- Relating the verses to concepts
- Following the previous, determine the relations between words, verses, surahs, parts and etc. by linking them to existing information on the web
- To complete and expand the ontologies of the Quran’s concepts by using machine learning approaches and …
- To ask questions and get inference from produced corpus using SPARQL on RDF data and parsing XML files containing Arabic syntax of verses
- To intelligently analyze the results to discover hidden linguistic and semantic relations in the Quran’s text





Run Date

2012


Theses


Papers