What can distant reading say about a blog, when we know its theme and we follow it either from the author’s side or that of the reader? What is expected from a digital analysis of a non-commercial blog?
There are numbers and ratios retrieved, and lists of words (the most commonly used) as well as links between them. There is a web revealed and a mapping done. The analysis is both quantitative and qualitative, the two tightly correlated.
A good number of digital analysis tools for texts have been developed and are in use the last 10-15 years. Those who have more understanding of such tools set themselves the terms of the analysis, to some extent; for ex. which common words (a, the, and, etc.) to exclude when composing the word frequency lists. This is not an impossible task, it takes however a lot of work and a brave brain squeeze. Though I find something intriguing to it, I don’t feel that brave to meddle with commands, expressions, and you name it. I have done it, and even got some result. But, the ratio (!) of success towards failure is a negative figure. A simple job can be done with the ready-to-use free online tools, like the Voyant tools, and such (with thanks).
Summary of the five most recent posts (here seen as a ‘corpus’):
This corpus has 1 document with 5,077 total words and 1,541 unique word forms. Vocabulary Density (ratio found by dividing the Total Words by the Unique Words): 3,30 (not too bad) [see literary examples: Vocabulary Analysis of Project Gutenberg].
Average Words Per Sentence: 22.3
Most frequent words in the corpus: art (49); artists (33); artist (23); like (22); work (20); blog (15); authority (13); time (13); words (13); life (10); sea (10); book (9); march (9); music (9); way (9); world (9); april (8); arts (8); comment (8); january (8); p.s (8); people (8); read (8); status (8); books (7); don’t (7); end (7); essay (7); facebook(7); film (7); google (7); irony (7); kapnissi (7); kind (7); leave (7); linkedin (7); loading (7); market (7); order (7); pinterest (7); poetry (7); posts (7); reddit (7); september (7); share (7)
By this, the theme of the blog is already set, with a little surprise in the mention of the ‘sea’. The social media presence was inevitable, as they make part of each blog post (that is why I did not remove these words/ names) even though not in the actual text. While here we see about 50 words, in the visualization with the name cirrus we can view many more words in one look; I set it up to retrieve 150, so this is what this cloud-like word list shows:
Quite interestingly but not a real surprise, the word ‘depression’ pops-up as a prominent one, yet not as prominent as the ‘sea’, or ‘music’. And it is possible to go even further and expand the viewing of the words used in this part of the blog, in this beautiful arch, which works itself linking word for word in a rhythmical progression:
As artists, we find and we make links between whatever lies in this world of ours. Words are more specific in this, that is why they are regarded as more appropriate for conveying meaning and for transferring knowledge (make a note for another post, though just one will not be enough for this topic). Digital analysis tools also find links between words in the analysed text. The result of such a search can be presented for ex. like this:
In a very quick viewing of this visualization, the word ‘status’ is linked to the word ‘artists’, the ‘artist’ is linked to ‘authority’, and ‘art’ is linked to the ‘artists’, to ‘history’, and to the ‘market’.
Reversing the findings, what is not there also says something about the analysed text. In this case, what is absent are the names of people, and specifically of (famous) artists.
Text analysis tools give a variety of options for breaking down the text into its components and re-composing it in an untangled form. The new forms, rather in plural, are untangled from whatever we have in our mind regarding the text(s). However, these tools also entail to some extent the choice for manipulation (of input and result). This makes the analysis a game, which seriousness lies upon you. A lot of responsibility again; here is a knot representing the vicinity or correlation (not clear) of the words ‘art’, ‘artists’, ‘work’, ‘authority’, and ‘time’:
I must say, that the first time I saw a visualization of a data set (or of a text, not sure) I was so impressed that since then I look for such things, mostly with the artist’s hat on. There are sophisticated people out there that can make real use of the analysis tools, systems, methods, etc. I am happy I managed to take a glimpse (and, I have some fun ideas…).
P.S. Text analysis and visualization are not necessarily connected. They can also live apart. Visualization lives in science and in art, and relevant studies can be done in either field. Here is someone who combines both; have a look, there are interesting things in here: http://manovich.net/