How Libraries can support Digital Humanities: reflections on #GaleDHDay

By Antony Groves (Learning and Teaching Librarian at the University of Sussex) @AntonyGroves

At the beginning of May, Gale organised their first Digital Humanities Day at the British Library. The event brought together a diverse range of speakers from around the world who spoke about different aspects of Digital Humanities (DH) scholarship; from infrastructure through to research and teaching. This post will draw out three themes from the day in an effort to better understand how we can support this growing area of work:

  1. Collaboration in DH research is key and libraries can play a role within these collaborations. 
  2. There are many different datasets, techniques and tools being used yet a common approach we can take to developing training.
  3. We should work on our own data projects if we wish to really understand what is needed to support the academic community.

Collaboration in DH research is key and libraries can play a role within these collaborations.

In the afternoon session, Dr Sarah Ketchley stressed that “Digital Humanities projects are inherently a collaborative undertaking” and the earlier presentations of Professors Mark Algee-Hewitt and Joris Van Eijnatten highlighted this. The work done by Prof Algee-Hewitt and others at the Stanford Literary Lab has involved a number of ‘distant reading’ projects where participants have used a variety of computational techniques to analyse large collections of digital texts. Looking at grammar and language respectively, Prof Algee-Hewitt’s research involved digital novels whereas Prof Van Eijnatten focused on newspapers using The Times Digital Archive; both resources that libraries can provide.

Throughout the day, flags such as these indicated potential roles for libraries in DH collaborations. For example, Dr Julianne Nyhan reflected on infrastructure and the challenges to researchers of obtaining data in a format that can be ‘mined’ – in one case having to obtain a hard drive from a provider. This is somewhere librarians can help and Lisa Mcintosh, Director of Access Services at the University of Sydney Library, shared an impressive list of services offered by their library in support of digital research:

  • Provide content for text and data mining
  • License permission and copyright support
  • Recommending tools and TDM (Text and Data Mining) resources
  • Integrating text mining into Information Literacy classes in the Humanities
  • Assisting humanities teaching staff to integrate text mining in the classroom
  • Getting started with data visualisation training • Data analysis and visualisation guide

There are many different datasets, techniques and tools being used yet a common approach we can take to developing training.

For those wondering which students this area of scholarship might appeal to; the answer is all of them. In an inspiring talk about introducing DH in the Undergraduate Classroom, Dr Sarah Ketchley showed that her 2018 ‘Introduction to Digital Humanities’ module was full, with 35 students from 21 different departments across campus. Not only is this type of scholarship appealing to students but it is also invaluable to them. For one reason, as explained by Dr Melodee Beals, “evidence is merely data with a direction”. If we want students to critically engage with evidence-based research, helping them to analyse the underlying data is of great importance.

The tools that students use in Dr Ketchley’s class have included OpenRefine, Voyant Tools and more recently the Gale Digital Scholar Lab – a cloud based platform containing a range of software that can be used with Gale databases to which the institution subscribes. This cloud based approach avoided issues encountered by previous cohorts where a whole lesson had to be dedicated to downloading and installing the required programs. Dr. Tomoji Tabata also introduced an open source tool called Stylo to be used for ‘rolling stylometry’, a technique to detect stylistic changes in passages of text.

Throughout the day, reference was made to many different techniques (e.g. topic modelling, named entity recognition, sentiment analysis); tools (e.g. Gephi, Google Fusion Tables, MALLET); and data sources (e.g. TROVE, Hathitrust, Gale Historical Newspapers). With so much out there, it can be hard to know how best to start providing support. Thankfully, Associate Professor Ryan Cordell brought clarity to this undertaking by proposing four steps to teaching humanities data analysis:

  • Start with creativity 
  • Teach using domain specific data 
  • Foreground corpus over method
  • Foreground mind-set over method (‘programmatic’ thinking more important that programming’)

We take a similar approach to developing our Information Literacy training sessions and find that it works well. In the short amount of time that we often get to see students in workshop, making the content of the session as relevant to a given cohort as possible increases engagement. In addition, focusing on how to approach searching (as opposed to how to use a particular tool) means that they can apply this learning to a range of tools that they may encounter not just the one or two included in the session.

“Work on your own data projects to understand what is really needed to support your academic community”.

This is a direct quote from the final presentation by Lisa Mcintosh, which was the perfect way to finish the day. While listening to the research presented throughout the day was fascinating and certainly highlighted areas where we can support this scholarship, managing our own data projects and facing the same barriers that our researchers encounter is what will really help us to understand the support that is most needed.

This may sound daunting but hopefully this post has shared at least a few resources that can be explored further, and take encouragement from Prof Van Eijnatten who asserted that “if I can write a few lines of code anyone can”.

Advertisements

An introduction to text mining with AntLab and Voyant Tools

By MmIT Committee Member Antony Groves

Image of Antony Groves
Antony Groves

Increasingly you may hear researchers, librarians and other information professionals talk about “text mining”. Although this is a process aligned with information retrieval, it is not always clear how we can support and engage with these related activities. The following post brings together a number of resources that show the value and benefits of text mining, and introduces two free tools to help you start exploring this growing area of work.

The introduction to the PLOS Text Mining Collection, a useful selection of open access research and essays relating to text mining, explains that:

“The maturing field of Text Mining aims to solve problems concerning the retrieval, extraction and analysis of unstructured information in digital text, and revolutionize how scientists access and interpret data that might otherwise remain buried in the literature”.

An example of this is Yale University’s Robots Reading Vogue project where a huge volume of text and data (over 6 TB) has been analysed to show, amongst other things, how the use of particular words has risen and fallen over the publication’s history (the n-gram Search). At the University of Sussex there are numerous projects coming from the Text Analysis Group and the Sussex Humanities Lab exploring large corpora (collections of written text by particular authors or about particular subjects) through text mining. We have even started to run workshops in the Library introducing tools to help students who are interested in this area of research. I would like to share two of these resources here: AntLab and Voyant Tools (you can find even more in the TAPoR collection).

AntLab contains a number of freely available tools (although donations and patronage are welcome) built by Dr Laurence Anthony, which can be found on the Software section of his website. For the purpose of this post, I would like to highlight AntFileConverter, a tool for converting PDF and Word files into plain text for analysis – something that can also be helpful for improving accessibility. To use AntFileConverter download and open the appropriate software version for your computer, drag the file you wish to convert into the ‘Input Files’ box, and click ‘Start’. For this demonstration I have used the PDF of the first Open Access volume of the MmIT Journal:

image1

As explained in the user support, “the converted files will be saved in the same directory as the original files with the same name but with the “.txt” extension added”. This .txt file can then be used with other AntLab software, although here will be analysed with Voyant Tools, a free “web-based reading and analysis environment for digital texts”. To do this, upload the .txt file created with AntFileConverter into the Voyant Tools box:

Image2

Click on ‘Reveal’ to run the analysis and view the results:

image 3

The default tools include Cirrus, Reader, Trends, Summary and Contexts, which you can learn more about in the Getting Started Guide. There are also a number of additional tools, including the TermsBerry. To use this particular tool, click on TermsBerry next to Reader above the second panel:

image 4

The TermsBerry shows how often particular terms occur and how frequently they appear next to other terms. The TermsBerry I have shared above shows that in Volume 43 of the MmIT Journal, the words ‘library’ and ‘information’ are two of the most common (they are in larger bubbles). If you hover over one of the terms, for example ‘digital’, you will see that this word appears 121 times in the text, most commonly co-occurring with ‘literacy’ (29 times), followed by ‘skills’, ‘media’ and ‘information’; topics that should interest MmIT readers!

To enable this mining and sharing, reforms to Copyright legislation mean that copies of a work can be made for the purposes of text and data analysis (providing you have lawful access to the original work, which in this case is open access). Additionally, as explained in the ‘Sharing outputs’ section of this Jisc guide, the results of the analysis can usually be shared with anyone (although there are exceptions to this when the analysis goes beyond counts and ‘facts’ about the work, and includes large amounts of the original copyright material). So armed with a few tools, and copyright law on our side, it’s time to make text mining yours.

 

 

 

Preparing for #GDPR

The new General Data Protection Regulation (GDPR) legislation, which replaces the Data Protection Act, is due to come into effect in May 2018.  With only a month to go until GDPR is introduced, your employers will almost certainly have taken steps to ensure compliance and (we trust) briefed their employees.  However, if you’re looking to extend your awareness of what is involved, we’ve rounded up some resources that may help:

We need to talk about #Storify…..

It would not be exaggerating to say that there were groans of anguish across the library and information community when Storify (now owned by Adobe) announced that the service will close in May 2018.  While the lengthy notice period was appreciated, with only a month until Storify closes how can we ensure that we preserve existing stories and what can we use as an alternative?

Archiving existing Storify stories

Wakelet very quickly rose to the challenge with a two step process to Import your Stories to Wakelet.  Storify users can create Stories until the end of April 2018, and have until May 16 to move their Stories across.

Alternatives to Storify

Wakelet, obviously.  However, it has taken the recent introduction of the Import from Twitter feature to make it more of a Storify experience: see the brief Twitter Import video.

If you are primarily interested in curating content, there are still many alternative social bookmarking sites that can fill the void e.g. Scoop.It or Pocket.  The excellent C4LPT website has a list of Curation & Social Bookmarking Tools.

 

Why is Storify closing? 

If you are interested in why social media service Storify is coming to an end, it is due to a sequence of acquisitions plus the growth in chronology and curation tools.  In a blog post Ian Milligan reminds us of how vulnerable user-generated content can be online,  and that we need to steward our data responsibly.

 

Discovery AND disorder

185495
Antony Groves

Committee member Antony Groves from The University of Sussex writes about the issue of Discovery and how sometimes a curve ball can be thrown at you when you least expect it.

Discovery is not a straightforward process, if it were some of us would be out of the job. However this should not excuse unpredictable tools and searches; some obstacles are reasonable to expect and some are not. How would a 110m hurdler feel if an extra barrier were added or if the first was moved 10ft forward? The answer is that we’d only know how they felt if we asked them or maybe observed their next race. The focus of this post is not intended to be UX though, but instead teaching, specifically how we talk to our users about fallible discovery services.

The anomaly that has prompted this post is the re-ordering of results when inserting AND between search terms in Ex Libris Primo (as of March 8th this appears to be happening at 15 Russell Group Libraries). This can be tested by typing the search terms academic integrity into your discovery tool, then academic AND integrity, and comparing the two. Although the number of results stays the same, some of you will see that the order of the items changes. Predominantly this appears to be a Primo issue (although is not happening everywhere with Primo) but Summon has its own mysteries. If you compare the above two searches in Summon, at several Russell Group libraries you will get a different number of results (although admittedly only a very slight difference).

The Association of College & Research Libraries (ACRL) Framework for Information Literacy for Higher Education establishes “Searching as Strategic Exploration” as one of its six concepts, furthermore explaining that “searching for information is often nonlinear” (ACRL 2015). However is this intended to excuse tools giving inconsistent results or instead explain that searching is an iterative process, or both? Yes, if we’re teaching our users to search for resources in a strategic and systematic way we should also be showing them the other databases we subscribe to and not solely relying on our discovery tools, but shouldn’t this be providing a solid foundation on which to build? If our discovery services are not as good as they can possibly be, students will very quickly turn to Google instead.

When we have noticed anomalies we have reported them to Ex Libris who have worked to resolve them or provided an answer as to why certain things are happening. The answer to a previous irregularity was that “the results of different searches aren’t necessarily comparable in a linear relation” (Ex Libris Knowledge Center, 2017). Is this a satisfactory response though? Within the Library we continue to user test our discovery tool (as do Ex Libris) and during our next round of testing we may find that students don’t mind these minor aberrations or perhaps are already used to shifting results from using Google. It could be that they haven’t asked, or even noticed, but as information professionals we should be ready to help those looking for the answer. Evidently including/excluding AND between search terms does make a difference, perhaps not to the number of results but certainly to the way they are ordered. I cannot currently explain to users why this is happening or which set of results really is more relevant. What I can do is show them other ways of sorting and narrowing their searches. Like that first 110m hurdle, it is an obstacle that can still be cleared, I just feel I would be a better coach if I could explain why it’s moved 10ft forward.

References

ACRL (2015) Framework for Information Literacy for Higher Education. Available at: http://www.ala.org/acrl/standards/ilframework#exploration (Accessed: 5 March 2017).

 

Ex Libris Knowledge Center (2017) Boolean searches in Primo don’t work as expected. Available at: https://knowledge.exlibrisgroup.com/Primo/Knowledge_Articles/Boolean_searches_in_Primo_doesn’t_work_as_expected (Accessed: 5 March 2017).