Audio corpus: 8 subfolders with .wav files Each containing : • 2 sound files containing a read story (“The sun and the wind”, each by speaker A and speaker B) • 2 sound files containing each 30 read sentences (each by speaker A and speaker B) • 2 x each of the 30 sentences as a single sound f...
108 WAV files of spoken Maltese newspaper texts, subdivided into 12 directories with a variable number of sentences (sometimes: clauses) each. They come together with transcriptions and tables of phoneme durations.
Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies tokens in plain text and assigns parts-of-speech Tools in workflow: MLRS POS Tagger web service (University of Malta) NOTE: The licence provided covers the web service only. To...
142,397 Maltese texts from 10 genres. The file “corpus.zip” expands into a folder “corpus”, containing the file “tagged.zip”, which expands into the folder “cwb.final”. This folder contains the files: • filelist.txt • malti02.academic.txt • malti02.law.txt • malti02.literature.txt • malti...
This lexicon is part of the collection of the Wikimedia Dumps which was retrieved as an XML file from http://dumps.wikimedia.org/mtwiktionary/20121105/ on November 5, 2012. In the Wikimedia dump, it is accompanied by a text file mtwiktionary-20121105-pages-articles-multistream-index.txt which li...
The corpus contains the Laws of Malta in Maltese from the official government website. The unannotated raw text files were extracted from the pdf files that can be found on the website.
This corpus is part of the collection of the Wikipedia Dumps which was retrieved from wikipedia.org on April 8, 2010. It comes with two individual XML files, one containing the Wikipedia articles and another containing the metadata about it.
The full editions of ILLUM from 12/11/2006 to 30/05/2010 (185 issues).
This is a wordlist which was created from 32 Maltese fiction books. These texts were originally in PDF file format and were converted to txt format. In the next step, the text file was tokenized and a frequency count was performed on the separate tokens. The resulting list (with about 50,000 entr...
This is the Maltese version of the Acquis Communautaire (AC), which is the total body of European Union (EU) law applicable in the EU Member States. It consists of selected texts between the 1950s and today, translated to Maltese.