A weekly list introducing two online resources of Islamic law, ranging from e-archives to e-libraries, from digitized personal collections to online depositories of first and secondary sources on Islamic law
Tashkeela-Model is an Natural Language Processing (NLP) diacritization model for the Arabic language that uses “a collection of Arabic vocalized texts, which covers modern and classical Arabic…[and] contains over 75 million of fully vocalized words obtained from 97 books, structured in text and XML files.”
ARCH (Archives Research Compute Hub) “helps users easily conduct and support computational research with digital collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. Users can build custom research collections relevant to a wide range of subjects, generate and access research-ready datasets from collections, and analyze those… CONTINUE READING
“The RASAM dataset was developed by the Research Consortium Middle-East and Muslim Worlds (GIS MOMM), DISTAM, Calfa, and the BULAC Library between 2021 and 2023. It comprises a diverse collection of Maghrebi Arabic manuscripts from the BULAC Library, featuring a wide variety of handwriting styles, layouts, states of preservation, and other characteristics representative of Arabic… CONTINUE READING
Bahadin H. Kerborani‘s (University of Chicago) “Short Introduction to Kurdish Online Resources” compiles a list of “online primary and secondary Kurdish resources and collections that might make knowledge and sources more accessible” to researchers.
The BADR project database “includes 43 texts (Sīra-maghāzī texts, ṭābaqāt, recollections of hadiths, tafsīr and dalāʾil al-nubuwwa), around 700,000 words, and several tens of thousands of encoded named entities” extracted from al-Maktaba al-Shamela and the OpenITI repository. It “can be freely accessed on the server of the University of Strasbourg.” Learn more about BADR here.
“Arabic LLM Benchmarks” is a comprehensive GitHub repository “of Arabic LLMs benchmarks and evaluation benchmarks, curated from systematic research on evaluating Arabic Large Language Models” and organized into four categories: Knowledge includes benchmarks evaluating acquired knowledge and reasoning capabilities, along with domain-specific benchmarks in fields such as law and medicine. Natural Language Processing (NLP) encompasses… CONTINUE READING
“Whisper-Small-Quran is a fine-tuned version of OpenAI’s Whisper-Small model specialized for Quranic recitations in Arabic. It’s part of the QV Finder project — a research system for AI-powered Quran verse transcription and retrieval. Unlike general ASR systems, this model captures Tajweed-influenced pronunciations, regional recitation styles, and noisy real-world recordings, achieving high accuracy across both professional… CONTINUE READING
The Saiful Bahri Collection contains “digital images of 23 manuscripts owned by Saiful Bahri of Lambunot, Besar Regency. The manuscripts contain texts on a range of topics, including Islamic law, Sufism, theology, history and fiction, in prose and poetic form; the manuscripts date from the 17th to the 20th century” in Achinese, Arabic, and Malay.
“M-Classi is a new digital tool in the field of knowledge organization. It is conceived primarily as a means of cataloging and interrogating the classifications of the sciences in Islam and those of the cultures with which the Islamicate world came into contact from antiquity to the pre-modern era. Practically, M-Classi is focused by priority… CONTINUE READING
Digital Ottoman Corpora is “a pioneering project aimed at transforming the way we access, analyze, and engage with Ottoman Turkish texts.” Their projects include an “artificial intelligence-based text recognition project [that] incorporates Handwritten Text Recognition (HTR) tools for Ottoman Turkish” and “the first Ottoman Turkish crowdsourcing project designed on Zooniverse.” Read more about DOC here.