
The reports accessible
from this area are generated on a regular basis from the access logs for most
web servers hosted at CGNET. More servers will be included soon.
There are two types of reports:
- Cumulative
(This year and last year): This
report is updated daily.
- Previous Month:
This report is created during the first five days of the month for the previous
month.
The following
is a brief explanation of some of the information provided in these reports:
-
General Statistics
- Hits vs. Page Views: Hits
represent actual transactions between a web browser and the web server.
If you visit a web page that has 5 pictures (graphics files) referenced
on it, then the number of hits generated will be 6. Page views, on the
other hand, represent only the actual pages you requested, so in the above
example there would be only one page view generated. Successful hits are
hits where the web server returned a status code indicating success (See
Appendix A: Status Codes).
- Page views vs. Document
views: Page views include both visits to static HTML pages as well
as to pages containing scripts, such as ASP pages. Document views include
only the former (visits to static HTML pages).
- Visitor sessions: A
visitor session is defined as a collection of accesses from the same IP
address with no more than a 30 minute gap between them. So, if you were
to visit the site and look at several pages, that would count as one user
session. If you were then to walk away from your computer, with the browser
still open on one of the pages, and return more than 30 minutes later,
and again look at several pages on the site, that would count as a second
user session. Visitor sessions are a more accurate measure of visits to
your site, in most cases, than hits or page views.
- Visitor sessions from the
United States vs. International sessions and sessions of unknown origin:
Webtrends uses its own algorithms to attempt to determine the origin
of the browser. This is an imperfect measure, because the only hint the
browser usually gives the server about its identity is its IP address.
Everyone on the Internet has a 4-place "IP," or Internet Protocol, address
(in the format of aaa.bbb.ccc.ddd) that indicates where their machine
is located on the Internet. The IP address can sometimes be "resolved"
to a domain name, something like "x.company.com," via a process
called "domain name resolution." The web server or Webtrends will attempt
to resolve all IP addresses to their associated domain. When this process
fails, which it frequently does for a variety of reasons beyond our control,
the IP address remains. In that case, we have no information about the
geographical origin of the request.
Even if the IP address can be resolved, the resulting domain may belong
to a global ISP or to an organization or company of unknown origin.
And, finally, the IP address itself may be misleading or inaccurate (see
below).
- Unique visitors: Webtrends
counts the number of distinct IP addresses in the log and reports this
figure as "unique visitors". If a given IP address appears more than once
in the log, this is counted as a "Visitor who visited more than once";
otherwise it is counted as a "visitor who visted once." But all three
of these figures also may be inaccurate:
- Proxy servers: If a visitor's
browser is behind a proxy server, then the proxy server may change
the apparent IP address of the browser, or may even aggregate many
IP addresses into a single address, reducing the figure.
- Dynamic assignment of
IP addresses: If the user is dialing in to his/her ISP, and the ISP
assigns IP addresses from a pool or using DHCP, then the address a
user receives for a particular login may differ from one session to
the next. This would tend to artificially increase the "unique visitors"
figure, although there is also a remote possibility that two users
from the same pool might visit the same web site at different times
using a common IP address, decreasing the figure and appearing as
a repeat visitor.
- Browser caching: If a visitor has
recently visited the page, his/her browser may have cached the page's content. If the visitor returns, the browser
may not even request the page from the server again; thus the server will not know about this
visit and will not record it in the log. This has the effect of decreasing the number of hits, visits and sessions.
-
Resources Accessed
- Most/Least requested Pages:
The least requested pages include only pages that have been accessed
at least once.
- Top Entry Pages/Requests:
Entry Pages are the pages which begin visitor sessions. Entry Requests
are the hits (pages plus other graphics objects) which were requested
at the beginning of visitor sessions.
- Top Exit pages: These
pages are the last pages requested by the browser before the end of the
session (the visitor went elsewhere).
- Single Access Pages: This
section identifies the pages on the site that visitors access and exit
without viewing any other page.
- Most Accessed Directories:
The directories (beginning with "/") which received the most visits.
- Top Paths Through Site:
This section shows you the most frequently traveled paths your visitors
take when accessing the specified web pages.
- Most Downloaded Files:
This section identifies the most popular file downloads for the site.
The number of downloads indicates the number of times the file was successfully
downloaded whereas the number of session downloads indicates the number
of individuals who downloaded the file.
For some reason, some files show as being downloaded more than once during a session. I think, but I'm not sure, that if you visit a URL that ends with .pdf, and then hit your back button, and then display the .pdf file again, that counts as two downloads but only one session download.
If an error occurred during the transfer, that transfer is not counted.
- Dynamic Pages and Forms: This
section shows the dynamic pages and forms that are used the most. Dynamic
pages are pages whose contents change over time (e.g. they are constructed
from database queries) or whose contents are personalized.
-
Visitors and
Demographics
- Visitors by number of visits:
This section shows the distribution of visitors based on how many
times each visitor visited your site.
- New vs. Returning Visitors:
This section shows the number of first-time visitors to your site
and the number of returning visitors to your site. Only visitors who can
be identified with cookies are counted. First-time visitors are those
who didn't have a cookie on their first hits, but had one on later hits.
Returning visitors are those who already had a cookie on their first hit,
and whose previous visit happened before the start of this report period.
Since many sites do not place cookies at all, this report is of limited
value to them.
- Top Authenticated Visitors:
This section shows the authenticated users who visited your site the
most. This report is only applicable for sites (or areas of sites) which
are password-protected.
- Top Visitors: This
section identifies IP addresses and/or domain names of visitors and their
relative activity level.
- Most Active Countries:
This section identifies the top locations of the visitors to the site
by country. The country is determined by the suffix of their domain names.
Use this information carefully because it is based on where the domain
name of the visitor is registered, and may not always be an accurate identifier
of the visitor's actual geographic location. For example, while a vast
majority of .com domain names are from the United States, there is a small
minority that exist outside of the United States. But even if a domain
name is registered in the U.S., many of its users may be from outside
the U.S.
- North American States:
This section shows which of the North American States and Provinces
were the most active on the site. This information is based on where the
domain name of the visitor is registered, and may not always be an accurate
representation of the actual geographic location of this visitor (for
example, individual visitors will often be seen as coming from the state
where their ISPs are registered.) Visits from users in domains with a
state string (such as jm.k12.ca.us (an elementary school in California,
USA), or boe.ca.gov (a fictional entity in the California state government))
would also be included in this report, but there are very few such domains.
There is no statistically valid way to compare the number of visits from
state X vs. state Y using the information in this report.
- Most Active Cities: This
section breaks down activity further to show which cities were the most
active. This information is based on where the visitor's domain name is
registered, and may not necessarily be an accurate representation of the
visitor's actual geographic location. For example, visitors are frequently
shown as coming from the city where their ISPs are registered. The reason
Vienna, Virginia is so prominent is because it is the home of AOL.
- Most Active Organizations:
This section identifies the companies or organizations that accessed
the site the most often.
- Organization Breakdown:
This section provides a breakdown by types of organizations (So-called
Top-Level Domains: .com, .net, .edu, .org, .mil, and .gov.) The table
lists the types of organizations in decreasing order of the number of
hits.
-
Activity Statistics
- Summary for Report Period:
This section outlines general server activity, comparing the level
of activity on weekdays and weekends. The Average Number of Visitors and
Hits on Weekdays are the averages for each individual weekday. The Average
Number of Visitors and Hits for Weekends groups Saturday and Sunday together.
Values in the table do not include failed hits.
- By Time Increment: This
section helps you understand the bandwidth requirements of the site by
indicating the volume of activity in kilobytes transferred. The table
provides various measures of activity by unit of time for the report period
(the unit of time depends on the amount of time covered by the report,
and will be the day in most cases).
- By Day of the Week: This
section shows the activity for each day of the week for the report period
(i.e. if there are two Mondays in the report period, the value presented
is the sum of all hits for both Mondays.) The table lists the number of
hits, percentage of total hits and visitor sessions for each day of the
week for the report period. Values in this table do not include failed
hits.
- By Hour of the Day: This
section shows the most and the least active hour of the day for the report
period. The second table breaks down activity for the given report period
to show the average activity for each individual hour of the day (if there
are several days in the report period, the value presented is the sum
of all hits during that period of time for all days). All times are referenced
to the location of the system running the analysis. The table lists the
percentage of total hits and visitor sessions, as well as the totals for
work hours (8:00am - 5:00pm) and after hours (5:01pm - 7:59am).
-
Technical Statistics
- Technical Statistics: This
table shows the total number of hits for the site, how many were successful,
how many failed, and calculates the percentage of hits that failed. Failed
hits are hits where a server or client error occurred. Cached hits are
those where the page was found in the cache of the browser, so the server
did not need to transfer the file.
- Dynamic Pages and Forms
Errors: This section shows you errors that occurred for both dynamic
pages and forms (pages with scripts or forms on them).
- Client Errors: This
section identifies the error codes from the browsers accessing your server.
The table lists all the errors that occurred in order of number of failed
hits. (See Appendix A: Status Codes).
Forbidden Access (HTTP Status code 403) usually means that someone typed
the incorrect password in a password-protected area. However, some web
servers are configured not to allow people to browse directories. For
example, if you try to access http://www.somedomain.com, you
might be told that you are not allowed to view the files in that directory.
Such an attempt would also be counted as a forbidden access.
- Page Not Found Errors:
This section identifies pages that returned "Page Not Found" (404)
errors on the server. These can be occurrences when someone types a URL
incorrectly.
- Server Errors: This
section identifies by type the errors that occurred on the server. The
table lists the errors in decreasing order of the number of failed hits.
(See Appendix A: Status Codes).
-
Referrers and
Keywords
- Top Referring Sites: This
section identifies the domain names or numeric IP addresses with links
to the site.
- Top Referring URLs: This
section provides the full URLs of the sites with links to the site. It
doesn't include visitors who typed in your URL.
- Top Search Engines : The
first table identifies which search engines referred visitors to the site
the most often. The second table breaks down the keywords used with each
search engine referring your site. Note that each search may contain several
keywords. Totals in this table represent the number of searches, whether
they contain one or several keywords. The third table identifies the main
keywords for each search engine.
- Top Search Phrases: Many
visitors to your site may be reaching it using search engines like Yahoo,
Excite, etc. This section shows you the search phrases that visitors are
using to reach your site.
- Top Search Keywords: This
section tells you which search engines people are using to find your site,
and the keywords used most frequently with each search engine.
At the most basic level, this section tells you which search engines are
being used most frequently to find your site. You may find that some search
engines are returning your site for the keywords you expect and that other
search engines do not.
-
Browsers and
Platforms
- Most Used Browsers: This
section identifies the most popular WWW Browsers used by visitors to the
site. Any hits identified as originating from a spider are not counted
in this table.
- Visiting Spiders: This
section identifies all robots, spiders, crawlers and search services (i.e.
Alta Vista, Lycos, and Excite) visiting the site.
- Most Used Platforms: This
section identifies the operating systems most used by the visitors to
the site.
Appendix A: Major
HTTP Status Codes
- 204 No content (empty page)
- 301 Moved permanently
- 302 Redirect
- 304 Not modified
- 400 Bad request
- 401 Authorization required
- 403 Forbidden
- 404 File not found
- 408 Request timeout
- 500 Internal server error (server
busy)
This document updated June 2005.