A weekly list introducing two online resources of Islamic law, ranging from e-archives to e-libraries, from digitized personal collections to online depositories of first and secondary sources on Islamic law
OpenITI MAKHZAN is a “large aggregation of Arabic-script ground truth and evaluation data drawn from a wide variety of Persian, Arabic, Ottoman Turkish, and Urdu print and handwritten (manuscript) documents.” “Comprising nearly 1,500 page images across 208 documents sourced from 30 repositories worldwide, the dataset spans seven languages, around 20 unique and mixed script types,… CONTINUE READING
The “LATOC (Latin-transliterated Ottoman Turkish Corpus) includes 143 Ottoman Turkish books, 13,252,350 words, written between the 15th and 20th centuries. The books were transliterated by domain experts and publicly shared on the Internet. The books in the corpus were automatically structured via a rule‑based approach and manually checked.”
“Al-Muwatta AI is an Islamic knowledge platform specialized in Maliki jurisprudence, combining retrieval-augmented generation (RAG) with multiple LLM providers. Named after Imam Malik’s foundational hadith compilation, the platform provides authenticated responses from classical Maliki texts.”
Hipajattul Islam [Feb 1930 -Dec 1936] is a digitized periodical collection “mainly focusing on the doctrines of Islam and spread of Islam in Tamilnadu [India]. Original material is held by Anjuman Nusrathul Islam, Islamic Public Library, Kottakuppam.” The collection features 1,815 images.
“Ptolemaeus Arabus et Latinus (PAL) is dedicated to the edition and study of the Arabic and Latin versions of Ptolemyʼs astronomical and astrological texts, and related material. These include works by Ptolemy or attributed to him, commentaries thereupon, and other works that are of immediate relevance to understanding Ptolemy’s reception and legacy in the Middle… CONTINUE READING
The Manuscripts of Arabic Handwriting (Muharaf) dataset is a “machine learning dataset consisting of more than 1,600 historic handwritten page images transcribed by experts in archival Arabic. Each document image is accompanied by spatial polygonal coordinates of its text lines as well as basic page elements. This dataset was compiled to advance the state of… CONTINUE READING
“Connected Papers creates a graph-based network that clusters similar works together, even if they don’t directly cite or reference each other. Instead of showing a citation tree, it uses data from Semantic Scholar to visualise academic relationships based on bibliographic coupling and co-citation analysis. This means you can quickly see groups of publications that share… CONTINUE READING
Tashkeela-Model is an Natural Language Processing (NLP) diacritization model for the Arabic language that uses “a collection of Arabic vocalized texts, which covers modern and classical Arabic…[and] contains over 75 million of fully vocalized words obtained from 97 books, structured in text and XML files.”
ARCH (Archives Research Compute Hub) “helps users easily conduct and support computational research with digital collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. Users can build custom research collections relevant to a wide range of subjects, generate and access research-ready datasets from collections, and analyze those… CONTINUE READING
“The RASAM dataset was developed by the Research Consortium Middle-East and Muslim Worlds (GIS MOMM), DISTAM, Calfa, and the BULAC Library between 2021 and 2023. It comprises a diverse collection of Maghrebi Arabic manuscripts from the BULAC Library, featuring a wide variety of handwriting styles, layouts, states of preservation, and other characteristics representative of Arabic… CONTINUE READING