2006 AHC-UK Annual Conference in conjunction with AHDS History
Digital Deluge: History in the 21st Century
Friday 17th November 2006
Wolfson Room, Institute of Historical Research, Senate House, Mallet Street, London
9.30-10.30 Session 1
'Dealing with the Digital Deluge: a response from the archive'
Matthew Woollard, Head of Digital Preservation, UKDA
This paper will be split into two strands. The first will discuss the huge amounts of data which are being created by historians both for their own purposes and for others. I will outline some of the problems with these data and the fact that they are not being preserved properly. I will make a clear case for the implementation of generic data formats/models for standard historical sources, and the necessity for historians and their funders to adhere to certain standards both for analysis and for subsequent preservation and reuse. This section will look at the acquisitions process of AHDS History and the competing demands faced. The second strand will attempt to deal some of the issues surrounding the vast quantities of data which is currently being created and which could be used by historians in the future. This section will deal with identification, commercial confidentiality, disclosure, version control, acquisition and preservation, though issues of dissemination will not be ignored.
10.30 - 11.00 Coffee
11.00 - 12.00 Session 2
'Digital deluge: the impact on the virtual scholar'
David Nicholas, CIBER, UCL Centre for Publishing
The paper presents the key findings of a number of recent and ongoing international investigations conducted by CIBER into the behaviour of scholars in virtual environments. It utilises an evidence-base of literally millions of visits to scholarly websites and while much of the work covers the scientific and social science communities, there are lessons for us all here from the ‘early leaders’. The paper covers: 1) the impact of massive digital choice on the scholar – especially changes in/new forms of information seeking behaviour; 2) coping strategies; 3) the next digital deluge (the moves towards open access); 4) the changes being wrought by the ubiquitous search engines – increased volatility among other things; 5) subject diversity. Argues, that in a digital environment, things never go quite as expected and, as a result, digital services should be subject to routine user analysis, preferably by deep log analysis methods, which will be explained.
12.00-1.00 Lunch and AHC-UK AGM
1.00-2.00 Session 3
'Integrating GIS and Data Warehousing in a Web Environment: A Case Study of the US 1880 Census'
Richard Healy, Dept. of Geography, University of Portsmouth
While there is an established tradition of work utilising GIS and standard relational data base methods for the mapping and analysis of published census tabulations, use of more specialised database structures optimised for high performance analysis of very large multi-dimensional datasets has yet to become widespread. These de-normalised database structures form a major component of data warehouses. They comprise ‘fact tables’ or ‘cubes’, which hold data records at a very fine level of disaggregation and contain search keys that allow the records to be linked to ‘dimension’ tables. These govern how the detailed data can be classified and aggregated into a wide variety of summary tabulations. The aggregation process is accomplished by On-Line Analytical Processing (OLAP) tools, which may run stand-alone, or be implemented as extensions to the SQL query language.
Since many data warehouses are now built on top of commercial ‘universal server’ databases, which also include facilities for the storage and querying of spatial datasets, it is now possible to conceive of integrating GIS and data warehouse functionality in a common processing environment that is also fully web-enabled.
At the same time as the components of the relevant technological jigsaw are coming together, the international North Atlantic Population Project has made available individual person records for several 19th century censuses, including that of the United States for 1880. This paper examines the issues involved in conversion of a subset of these data for Pennsylvanian heavy industrial workers into an integrated web-based system that allows interactive menu –based generation of custom census tabulations from the individual records, automated mapping of OLAP results using SVG graphics, map-driven querying of the data warehouse, and pipelining of GIS and OLAP operations for complex spatial analytic queries. Results of the pilot implementation indicate that such an integrated approach offers a very flexible and powerful new methodology for investigating this kind of large social science dataset.
2.00-3.00 Session 4
'Wiki me no wiki, the response of the academy to the internet'
Michael Moss, Professor of Archival Research, University of Glasgow
Writing in the THES in September this year Clive Bloom, professor of English and American studies at Middlesex university, and John Higgins, associate professor of English at the University of Cape Town, railed against endemic plagiarism under the time-honoured headline, ‘You, sir, are a cad, a cheat and a bounder’. Their main target was Wikipedia that they confuse with the mind-numbing eye-straining boredom of Powerpoint. In the past there was no alternative but to respond to such a challenge with pistols at dawn. They rightly pointed out that the availability of a host of online resources and cut and paste facilities have made plagiarism much easier, but they went further in claiming that the internet somehow negates thinking. They looked back to the old certainties of a positivist universe where students visited libraries where knowledge was mediated in the print culture. Such nostalgia is widespread amongst many academics particularly in the humanities, but profoundly misplaced. It fails to recognise that a good deal of copying went on in libraries, but much more importantly refuses to accept the far-reaching changes in resource supply and discovery that is being enabled by the web, in which librarians and archivists are playing a part. These changes threaten the power relationship between professional groups and their clients (doctors and patients, teachers and students and so on), whose response to knowledge encountered on the web is often hostile of the ‘how do you know it is true variety’.
There are issues about disintermediated information supplied from multiple sites of production on the internet that need to be explored, but to reject all such information as suspect is foolish. In history it overlooks the origin of our discipline amongst men and women whom we might now characterise as amateur antiquarians. The internet has placed a powerful tool in the hands of interest groups (family and local historians) to distribute information of interest to themselves, which because of the power of Google anyone can access. We might dislike some of this content, but we can do little to prevent its declaration or stop ourselves or our students from encountering it. This paper will take metaphoric pistols to those who see red at a urls in footnotes and explore what is happening on the internet from the twin perspectives of an archivist and an historian, drawing on the emerging body of information theory and the experience of writing my newly published study with Laurence Brockliss of the University of Oxford of military doctors in the French wars, Advancing with the Army: Medicine, the Professions and Social Mobility in the British Isles 1790-1850 (OUP, 2006)
3.30-4.30 Session 5
e-Science and Grid Technology
Lorna Hughes, Arts and Humanities e-Science Support Centre (AHESSC), King's College, London
At the end of the Nineties, a national e-Science Core Programme was established in the UK. Its agenda was driven by scientists who needed new technologies and concepts to cope with the ever increasing amount of data, both from experiments and simulations as well as knowledge gathering exercises. Faced with this 'data deluge', a new data-driven science was conceptualized with the scientist and research methods at the center of new data technologies. The idea of e-Science and the e-Scientist was accompanied by the development of new high-speed computing networks that promised solutions to a variety of problems in coping with the vast amount of information. 'Grid technologies' were the result of a global effort from computer scientists working together witch practitioners to advance existing network technologies like the internet in order to create a global space of sharing resources and services.
The Arts and humanities have not, up until now, been served by these developments. This is despite the fact that digital resources in these disciplines have mushroomed over the past decade: the Arts and Humanities Research Council commits roughly half its annual budget to projects which produce some form of digital content, as did its predecessor, the Arts and Humanities Research Board. This burgeoning of digital material, which is often fuzzy, incomplete or inaccessible, brings exactly the kind of challenges which e-Science can address.