ICT and Information

Institutional Repository Software

Institutional Repository Software:

The chief aim to have an organizational repository is to offer open access to organizational analysis result by self-archival and storage and conserve other organizational digital assets, which include the un-printed or otherwise easily misplaced (grey) literature (for instance dissertations). To some extent IRs are related to the concept of a digital library i.e., gathering, maintaining, segregating, compiling, conserving and offering retrieval of digital subject matter similar to the traditional operation of a library. There are different software extant that can fulfil the aim of the organization.

Open source describes the technique of software developments that uses the strength of distributed analysis by colleagues and procedure clarity” (O’Mahony, 2003). “The source code and privileges that were generally limited for the licenses holders are now being offered under free license that allows the creators/users to analyze, modify, enhance and sometimes also to distribute the software (Open Sources Initiative, 2012).

With several Open Source Software (OSS) applications now accessible for library and information management, institutions can have latest choices for attaining and executing systems. DSpace, Greenstone and EPrints are instances of few preferred open source software applications for library and information management.

The above mentioned open source software for IR will be studied at greater length here. This study aims to find various features and functionality of these IR software’s keeping in view requirements of repository government publications. Open source IR management software offers extensible attributes to the managers and permits the firm to display their digital archives to scholars across the globe. With full rights of software and source code available to the implementers, organizations can extend the functionality of the software based on their specific requirement. A brief description of three softwares mentioned above is given below:

1. DSpace:

The DSpace is a joint venture between the MIT Libraries and HP labs. It is a digital asset management system that permits organizations including libraries to gather, achieve, list and spread the erudite and academic subject matter of a community. Created with an amalgamation of technologies by MIT, it is chiefly employed to seize bibliographic details that talk about articles, analyses, dissertations and researches. “DSpace is adjustable to varied community requirements. There is built-in interoperability amongst the systems and it sticks to global criterial for metadata format. Since, it is an open source technology platform, DSpace can be modified to expand its abilities” (Madalli, 2003). Few of its attributes as stated in DSpace records include:

It is a service prototype for open access and/or digital archiving for recurrent retrieval.

It offers a platform to structure Institutional Repository and the compendia can be searched and accessed by the Web. The compendia would be open and operated across different systems. DSpace intends to imitate the framework of the institution. DSpace is segregated into communities, which can additionally be categorized into sub-communities indicating the characteristic institutional framework. Communities contain collections, which are sections of linked subject matter. A compendium may be seen in over a single community. Every collection includes items, which are the fundamental archival facets of the archive. Every item is possessed by one compendium. Furthermore, the item may be seen in additional compendium; on the other hand, every item has a solitary owning compendium. Items are additionally segregated into named groups of bitstreams. Bitstreams are, as recommended by the names, streams of bits, generally ordinary computer files. Bitstreams that are to some degree innately linked (for instance HTML files and pictures that make a single HTML document) are arranged into bundles .

Figure 1: DSpace Data Model

Every bit-stream is linked with a solitary Bitstream Format. Since preservation services are a crucial facet of the DSpace service, it is significant to capture the particular formats of files that users upload (Tansley, R., 2003). The format of bitstream is distinct and offers a clear manner to segregate a specific file format.

As indicated in figure 1, DSpace data prototype, every item has one authorized Dublin Core metadata record. Despite the item having other metadata stored in as serialized bitstream, every time it is only the Dublin Core that us employed to offer interoperability and effortless discovery. The Dublin Core may be entered by end-users as they upload the subject matter, or it may originate from other metadata as part of an ingest procedure.

The aspects of DSpace as Digital Management Software are as follows:

i. Authentication: “DSpace permits contributors to restrict retrieval of items in DSpace, at both thegathering and the individual item level”(Bass, M.J.2002). It is the method wherein the system securely recognizes its users.

ii. Non-dynamic HTML document Support: As stated in the records, DSpace merely endorses uploading and downloading of bit streams. This mode is suitable for most of the file-formats such as PDF, Word Document and so on. In the context of HTML documents, they are intricate as they are made up of multiple files that are cross-related to one another. This has crucial consequences when we talk about digital conservation. Web pages also link to or comprise of subject matter from other sites, frequently undetectable to the final-users. Hence, after few years, when an individual sees the preserved Website, they would possibly discover several broken links or references to sites that may no longer be useful. “Actually, it may be doubtful to the final user when they see the subject matter stored in DSpace and when they see subject matter that is part of another website, or have moved to a page that is not stored in DSpace” by Tansley et.al (2003). DSpace can store and offer on-line browsing skill for self-contained HTML documents. The links preserved for photographs, videos and the like are reserved as comparative links.

iii. OAI-PMH Support: “The OAI-PMH is a norm to harvest metadata. This permits sites to programmatically access or ‘harvest’ the metadata from varied sources, and provide facilities employing that metadata, like listing or linking services” (Open Archives Initiative, 2012). DSpace exposes the Dublin Core metadata for items that can be retrieved are publicly (anonymously). Furthermore, the collection framework is also exposed via the OAI protocol’s ‘sets’ methodology. Deletion of details for items that are extracted is not seen by DSpace’s OAI. DSpace also endorses OAI-PMH renewal tokens, and Hierarchy to deal with subject matter (i.e. Communities, Collections, and Items).

iv. Object Management: The procedure of item assimilation in DSpace is through a web interface or batch item importer. The workflow procedure may include one or more phases according to the user requirement. A web interface is used to create the collection and communities in DSpace.

v. Import & Export: Import & Export for communities, collections and items is endorsed by DSpace. It also comprises of batch tools to import and export items in a simple directory format, where the Dublin Core metadata is stored in an XML file. This may be utilized as a foundation for shifting subject matter amongst DSpace and other systems.

vi. Statistics: Statistics are given for use by the administration. “The statistical reports/summary can be utilized to execute evaluation on repository, giving details such as number of items uploaded, searched, number of e-people registered with the system etc.” (Bradley & Blackall, 2007).

vii. Handle System: To assist in the development of consistent identifier for each item of DSpace, the Handle system’s global resolution aspect is used by the system. DSpace needs a storage and location independent method to develop and sustain the identifiers. “A Handle server runs as a distinct procedure that receives TCP requests from other Handle servers, and gives resolution requests to a global server or servers if a Handle entered locally is not equal to any local content” (Corporation for National Research Initiatives, 2010).

viii. Customization & Types of Document Supported: DSpace permits adaptability to fulfil the multidisciplinary and requirements of any institute. However, DSpace offers an adaptable data object prototype. “DSpace does not permit development of extremely varied objects with independent metadata sets on account of its database oriented architecture” (DSpace System Documentation, 2011). DSpace collections comprise of audio, video or text based on the requirements of the institution. The system can operate with different file kinds: PDF, HTML, JPEG, TIFF, MP3, and AVI etc.

ix. Optimized Search & Browse: System permits final users to identify subject matter in different means. “DSpace includes default indexing of basic metadata set qualified DC is offered by DSpace. It endorses fielded search, stemming & stop words removal. By default Browsing in DSpace is by title, author, and date field. Furthermore in DSpace CNRI Handle is a consistent identifier employed for each bitstreams in every item” (Donohue,2015).

2. GreenStone:

Figure 2: Greenstone Data Repository

Greenstone Digital Library Software is employed to offer a novel manner to arrange data so that it can be retrieved over the Internet. Some of the distinct attributes of the

Greenstone is enumerated subsequently:

i. Accessible via web browser: The powerful search facilities provided permit easy access to the collection through internet.

ii. Full Text and Field Search: Most of the collections provide specific and individual indexes for full documents, sections, titles, authors, etc. These indexes are further supported with a search option so as to search the full text of documents, or choose from the different indexes as listed before. The results thus obtained can be further sorted by metadata element or be ranked according to one’s relevance.

iii. Flexible Browsing Facilities: The most user-friendly facility provided is the ability to use keywords to easily browse through the collections. These keywords can be related to the titles, authors, classification structures, dates, etc. This ability is provided individually for each of the collections characterized with a broad range of browsing interfaces.

iv. Create Access Structures: Automatically The software is well applauded for creating collections that are characterized with extremely easy maintenance. “This is mainly owed to the fact that the software allows all searching and browsing structures to be built in directly from the documents themselves (figure 1), with only maintaining the original links without any option to insert new links” (Bainbridge et. al, 2004). This enables the user to merge the new documents of the same format into the existing collections automatically. For certain collections, this process is done automatically over a regular interval without any manual intervention. Thus, the indexes get rebuilt with the addition of new material after short intervals.

v. Make Use of Available Metadata: Metadata, in the form of descriptive information, serves as the raw material for browsing indexes and includes information related to the author, title, date, keywords, etc. “Metadata can be associated with each single document, or with individual sections within the documents” (Don et.al., 2002). Metadata is required to be openly provided or derivable automatically from the source documents. A majority of the electronic documents are found to be using the Dublin Core metadata scheme.

vi. Plug-in Extends System’s Capabilities: The Greenstone software is equipped with the ability to write “plug-in” for new document types, so as to be able to accommodate different kinds of source document. These plug-ins are available for Word, plain text, html, PDF, PostScript, E-mail, some proprietary formats, etc.

vii. Designed for Multi-gigabyte Collection: The software has been designed to be able to store millions of document within each collection, totaling up to several gigabytes.

viii. Multilingual Support: The software has an extensive multilingual support built in supporting languages such as French, Chinese, Spanish, Maori, and Arabic. It uses Unicode throughout and allows any language to be processed in a consistent manner through On-the-fly conversion.

ix. Collections Support Multiple Formats: The collections created under the software are equipped to support a multitude of textual as well as non-textual material, such as text, pictures, audio and video clips (Witten et.al, 2012). The two ways of imbibing the non-textual material into these collections is either by way of linking it to the textual documents, or by accompanying them with textual descriptions to allow full-text searching/browsing as shown in figure 2.

x. Administrative Function Provided: Administrative function provided by the software enables the authorization of new users by the previous ones, so as to ensure that the collections are protected and only accessible to only registered users on presentation of a password (Witten et.al, 2012). Further, another function provided is the ability of the software to keep logs of all user activity which can record the queries made regarding each Greenstone collection.

xi. Collections can be published on the Internet or on CD-ROM: The software is capable of presenting the collections over the World-Wide Web as well as on CD-ROM (in precisely the same form), which are compatible with all versions of the Windows operating system.


3. EPrints:

EPrints, a free software was created by the “University of Southampton, England”. EPrints repository gathers stores and disperses investigative results in a digital form developed by an investigative community. It allows the community to upload their preprints; post prints and other academic reports by utilizing a web interface, and arranges these publications for effortless access. It is the globe’s initial, preferred and employed, most operational Open Access IR software. EPrints is an adaptable content management system. It has been comprehensively aligned to fulfil the requirements of scholars and analysts to spread and record data; it can be effortlessly employed to conserve and spread pictures, investigative information, audio archives – anything that can be stored in a digital format, by changing the configuration to some extent (EPrints Services, 2006).

Eprints is user friendly software. Users need to go through simple phases for the submission procedure and need to offer metadata details in addition to an electronic copy of the document. Users can merely enter metadata like document type, title, author name, date, etc. via a web form; they need not have any knowledge of HTML or XML. The metadata fields that appear on the form are chosen by the manager. Managers can effortlessly adapt the metadata format, so that only those fields that are relevant to a specific collection are put forth to the final-user. The user can effortlessly handle the submissions to the archive, in addition to editing, revising, and removing the documents post the submission (despite the manager having the permission to restrict these operations) . Any of the metadata fields within a collection can be used to browse in EPrints; furthermore varied benchmarks can be employed such as author, year, publisher, etc. The browsing segments that a user can use are managed by the administrator. “The documents in an EPrints archive can be listed to permit online access on search engines such as Google, which assists to guarantee more access to, and more spread of any items uploaded to the archive. EPrints does not endorse Boolean searching” (Repositories Support Project, 2010). It is also simple to run a search that provides no outcomes. For final users familiar to latest search engines and databases it might be disheartening to get an unsuccessful search with no recommendations for alternative search policies.

The distinct attributes of EPrints include:

i. Accessibility via Web Browser: A web based interface is provided by EPrints; this makes it simple to use and hand out.

ii. Full Text and Field Search: Metadata is used for any searches. Searching in EPrints permits allows examining all the metadata field kinds in the database by employing easy or advanced search. Any metadata field can be searched with fine granularity by SQL querying the database.

iii. Administrative Function Provided: EPrints archive can employ any metadata schema as being offered by the administrator. The administrator chooses what metadata fields are held about each EPrints item.

iv. Open Source Software: It employs MySQL, Apache database and web server. MySQL is the world’s globe’s most preferred open source database management system, known for its speed and dependability and post April 1996, Apache has been the preferred web server on the Internet. “The script language “Perl”, which is low level yet strong is used to program Eprints” (Jayakanth, 2002).

v. OAI-PMH Support Open: Archives norm permits sites to programmatically access or ‘harvest’ the metadata from many sources, and provide facilities employing that metadata, like listing or linking services. Such a service permits e-prints servers to develop the possibility for an international network of cross-searchable investigative data, by permitting the subject matter of servers across the globe to be searched at the same time by employing the OAI (Open Archives Initiative) norm.

vi. Statistics: Statistics are given for use by the management. Statistical reports/outline may be employed to execute performance evaluation on the repository.

vii. Customization: The EPrints data model includes user defined metadata; so as to export data in other formats plug-ins can be written. For developers who desire to retrieve the fundamental Digital Library functionality Core API in Perl language is given.

viii. Item preview in EPrints: Once the file is uploaded, EPrints provides a thumbnail preview of documents and photographs that is automatically created. Table 1 indicates the comparison data amongst the three.

Original Reference Article:

  • Jain, C. (2017). Evolving a Model for National Digital Repository of Indian Government Publications using Institutional Repository Infrastructure.

Md. Ashikuzzaman

Work at North South University Library, Bangladesh.

Leave a Reply

Your email address will not be published. Required fields are marked *