| [ Current Topics | Complete List of All Active Topics | RSS feed | Search ] |
Websites as graphs: what does it mean?An intriguing link for making some beautiful graphs (link provided at robot wisdom by our friend Jorn Barger). -- Edward Tufte, May 26, 2006 |
|
Response to Websites as graphs The pretty graphs are the work of Sala, a young conceptual artist living in Zurich, Switzerland. He also shows an intriguing 1,000 paintings of 1,000 numbers at his site: http://www.onethousandpaintings.com/home/ He has a show in Basel, which maybe I can see when I go to ArtBasel in a couple of weeks. Now for the website graphs, there is a link to applet so that you can view any website as a graph as well as watch the graph constructed. To interpret the graphic requires learning a color code and understanding the design/ conceptual meaning of HTML tags. What do the graphs mean? It would be very helpful to have more instructions on how to read the meaning of the graphs. For the 7 dot colors, what does each color mean beyond a tag; what programming-analysis concept it measure? For example, why should tags for linebreaks and blockquotes (BR, P, and BLOCKQUOTE tags) be plotted? What is the substantive meaning of the gray dots (all other tags)? These questions no doubt reflect my profoundly limited knowledge of HTML structure. I need some help in how to read and interpret the graphs beyond their artistic nature. There are brief interpretations with good samples (for cnn, boingboing, apple, yahoo, msn, wired, google), but it would be helpful to see more explanatory detail in those interpretations. Perhaps several thorough sample readings of a few graphs would be helpful for the naive reader, such as me. Could a Kindly Contributor provide a detailed and annotated analysis of, say, the graphs for nytimes.com and edwardtufte.com? (The nytimes.com graph takes a while to generate.)
-- Edward Tufte, May 27, 2006 |
|
First note: the applet isn't showing a web site, but instead a single web page. Each graph is radiating from a single black node which represents the root of an html document. These pictures are fundamentally trees, much as you would draw if you were looking at a family tree, or a species hierarchy (I'd reference a particular textual example, but to my surprise I didn't find one in the three Tufte hardbacks on my self). They don't /look/ like trees because the nodes are pushing away from each other, rather than hanging down from the top (the top here being the black node). "what does each color mean conceptually" Starting with the easy ones first: The color GRAY only means "anything that the artist thought was uninteresting". There are a number of different concepts trapped in the gray, and they cannot easily be distinguished by color alone. Black: black is the top of the document. Most HTML documents have two pieces inside of the black element - a small HEAD section, which usually includes the title of the document, and some meta information about it, and the BODY element, which is where the meat is. The HEAD section will normally appear as a small grey flower, the BODY will be the multicolored monster spider thing. Red vs Green: these two colors are being used primarily to highlight structure in the document used to describe the presentation of the material. For a long time, authors controlled the location of text, images, and other content elements by working with tables (and tables within tables). Now that a larger market share of browsers support stylesheets, authors are using divs sections where once they used tables. So a red skeleton vs a green skeleton tells something of how the layout of the page has been done. Of course, sometimes a table is a table - in that case it will look like a chain of red flowers (possibly with some blue and grey mixed in). Example: http://www.aharef.info/static/htmlgraph/?url=http%3A%2F%2Fwww.whiterose.org%2F%7Edanil%2Fciv4%2Fa4%2Fopening.html Violet and Orange: these colors tell you that you are looking at content. Orange will normally represent text, violet will normally represent images. One violet dot is one picture, but there's not necessarily any clear relationship between an orange dot and the amount of text it represents. Blue: hyperlinks - connections from this page to somewhere else on the web. There are some very common patterns that appear regularly: blue attached to purple is a picture you can click on. A blue flower is normally a list of links. Orange with blue is a bunch of text with links in it. Yellow: web forms - conceptually, groups of elements which allow the user to communicate back to the author/webserver/program in some way. Yellow will usually be flowers. These aren't particularly the choices I would have made, were the object to use these pictures as an analysis tool. In particular, the markup concept (headings, bold, italics, etc) is absent. (More to follow) -- Danil (email), May 27, 2006 |
|
This is very helpful! Thank you Kindly Contributor Danil. Thus the graph is about the coding anatomy of the homepage, not about website structure as a whole. How can we reach conclusions about a website (as the text accompanying the graph does) on the basis of the homepage HTML coding practices? -- Edward Tufte, May 27, 2006 |
|
OK, so lets look at http://www.edwardtufte.com/tufte/ What can we see immediately? The first thing that catches my attention is the blue and orange flower - that's a group of text and links, where everything is at the same level. A quick glance at the page tells us what we are looking at: that's your "Ask ET: Selected Topics" section. The next thing I notice is that the skeleton is mostly red. In other words, that the presentation of the page is being controlled by table elements. I looks like we have four flowers of clickable images (these appear as nests of red nodes surrounded by a ring of blue with a ring of violet outside it). Each of those flowers represents a section of your clickable pictures. The tiny flower (blue and violet, with a single red node and a single orange node in the middle) is the menu at the top of the page "home book courses...." The small orange and violet flower is your course description (the violet represents the bullets, with the orange being the text of the bullet points), and the big red flower with blue and orange at the very end is your travel schedule. (These two flowers are proximate to each other because - in the tables that are used to control page layout - they are proximate to each other. Not a big surprise, given that they are adjascent on the page). The small gray flower? That's the title section of the document, as I mentioned earlier, and probably the easiest way to find the one black node that represents where your page starts. -- Danil (email), May 27, 2006 |
|
-- Danil (email), May 27, 2006 |
|
I was going to complain of the page's colour key,
What do the colors mean? -- Derek Cotter, May 27, 2006 |
|
Now what we need to see are frontpages, several of them, in parallel to their representation in the graphs, several of them. My friend Philip Greenspun is flying down to visit tomorrow Sunday; I'll discuss the graphs with him.
-- Edward Tufte, May 27, 2006 |
|
This is an interesting system, but I'm not sure how useful it is as a metric of quality or complexity (though the fact that Google comes up so simple is suggestive). My personal theory is that complexity in page flow, i.e., having to click many times to accomplish a task, is more difficult for users than complexity in individual page layout, which is what these graphs are depicting. A second shortcoming in the graphs is that they would show a site that uses Cascading Style Sheets as much simpler than a site that uses old-school HTML tags for formatting. The two pages might look identical to a reader, yet one would be depicted as vastly simpler by this visualization tool (mostly because the complex formatting instructions had been pushed into a separate file, i.e., the style sheet). So... I guess it is a good measure of whether a site is using modern Web design techniques (lots of DIVs) and, if not, a measure of whether a site is formatted reasonably simply and sensibly. -- Philip Greenspun (email), May 27, 2006 |
|
As a career web designer / developer I can confirm Philips comments about how we can use websitesasgraphs to differentiate between homepages using modern tagging techniques. By spooling sites through the tool we can see who is using modern HTML (semantic markup) vs. a <table> based layout. The current trend, as it should be, in modern HTML markup is to use appropiate tags (semantically correct) to markup the actual content. As for the grey nodes, the first commenter suggested these were considered uninteresting to the designer....the truth is that the grey nodes reflect the metadata within a page. The metadata is what the search engine crawlers use to index web pages. Some designers, depending on who they work for, consider these nodes the most important in markup. Below, is what I posted to my corporate intranet blog about this specific tool a few days ago. websiteasgraphs has a fun flashy way of visualizing a pages html markup. The tool differentiates the <table>, <tr>, and <td> tags between <div> tags using colored nodes as follows: blue: for links (the A tag) By comparing how many red nodes there are vs. green nodes, the visualization can be a quick way of determining who is building pages using the <div> tag approach vs. <table> markup. I was listening in to Maggie Blayney Web Strategy and Innovation call this morning where we were looking at Sun, Oracle, and Microsofts' homepages which are considered our dignified competitors when it comes to planning and creating homepages for large IT companies. So I thought I'd run all four sites through the graphing tool. Below are screenshots of what the tool came up with. The first three sites don't reveal much of anything other than they all make serious use of a <table> style markup, but Sun's screenshot is vastly different in that they've opted for the <div> approach, using only four tables. Visiting their landing page reveals nothing visually out of the ordinary from the other three sites. Its a great proof of concept that site design without the use of tables is finally being adopted even at the corporate IT level. ibm.com ![]() microsoft.com ![]() oracle.com ![]() sun.com ![]() -- Jeremy Graston (email), June 9, 2006 |
|
From Jeremy's post - As for the grey nodes, the first commenter suggested these were considered uninteresting to the designer....the truth is that the grey nodes reflect the metadata within a page. The metadata is what the search engine crawlers use to index web pages. Some designers, depending on who they work for, consider these nodes the most important in markup. My first thought concerning this graph tool was search engine optimization and how useful it would be to see graphically how search engine friendly a particular Website was, then with changes the map would become more dynamic as its optimization improved. Wished I was a brillant software engineer to develop this further for the exact purpose of SEO. There's plenty of businesses claiming expertise in SEO, but with the changing algorithms of the search engines it's a never ending process. -- Christine (email), July 13, 2006 |
|
I love the look of the graphs, as well as the way that they unfold themselves. Does anyone know if there is any commercially i available application that would allow you to draw this kind of graph with *any* relational data set, rather than just web-page data? -- Andrew Abela (email), July 13, 2006 |
|
The site also makes beautiful graphs for websites that don't exist -- try it with the fictitious URL of your choice. Isn't it fake? Am I missing something? -- Gail Elber (email), October 17, 2006 |
|
Brilliant idea. Why didn't someone do this earlier? Why didn't I do it earlier? -- Edward Tufte, October 17, 2006 |
|
Possible explanation for fictitious-URL graphs To my eye, any fictitious URL turns up the same graph (notice the yellow and blue clusters), suggesting this hypothesis: the program is graphing the default error page that its server delivers when it comes across a non-existent URL. -- John Jones (email), October 17, 2006 |
|
more than 2000 such pictures in flickr, tagged as websitesasgraphs: http://www.flickr.com/search/?q=websitesasgraphs&z=t Enjoy ;-) -- christianhauck (email), October 17, 2006 |
|
It might be interesting to see how similar organizations lay out their websites. For example, how do the website graphs of Ivy League universities compare? Or, how do the website graphs of various television networks compare? That might shed some light on how web designers in different industries approach their craft. Any takers? -- Miklos Z. Kiss (email), October 18, 2006 |
|
Heres an interesting timeseries visualisation of the development of a Wikipedia entry for a controversial subject (evolution). Not a graph per se but a lovely way for a layperson to understand how a mature Wikipedia entry arrives at completion. -- Matt R (email), May 4, 2007 |
|
These are quite attractive graphs. I echo an earlier sentiment about wanting to use the software for arbitrary graph visualization. (Especially if the diameter of the circles representing nodes can be adjusted.) Since the HTML is meant to be consumed by a computer, viewing a graph like this is, IMO, primarily interesting to understand how easy or hard it would be for the computer to consume a particular web page. The grossest measurement we can get from these visualizations -- the size of the graph -- is not that relevant; simply-structured graphs may have high node counts. (It might matter on mobile web browsers with limited available memory.) However, perhaps other constructions indicate improved or degraded rendering performance. For instance, a common problem with table-driven designs several years ago was that the browser would show nothing until it saw the complete table. A large table that might cause such a problem would show up here as a flower centered on a red node, with lots of red child nodes. (So shouldn't it be easier to see the color of the node at the center of a flower?) Another problematic design might be long chains of nodes indicating deeply-nested tags. Such a design requires the renderer to maintain a large stack of currently-open tags; in a resource-poor environment (again, mobile browsers), that might exhaust memory and make the page unrenderable. I'm not convinced that this visualization has a great deal of utility, but if it does, I think it is in highlighting problems such as these. -- Phil Groce (email), May 4, 2007 |
|
|
|
|||||||||||||||||