Tips for searching the KDLA Website Archive
Where can I learn more about the terminology used on this site?
You can find terms used on this site in the glossary.
How do I contact the Kentucky Website Archives located at the Kentucky State Archives?
Visit our contact page here.
What is included in the Website Archives?
The collection consists of over 360 domain names that have been identified and appraised for frequent capture. We crawl every link that is part of the originating domain name including images, text, and video.
What file types can be stored and accessed?
Typically, if the file can be downloaded from the web without direct user intervention, then it can be stored and accessed. We sometimes cannot provide password protected files, databases, or files that require filling out a form for access. To learn more about file types that cannot be harvested, click here.
What does it mean when there's an asterisk (*) next to a date on the search results page?
Some web pages are not updated very frequently while others are updated often. When our automated system crawls the web, only about half of all pages on the web have changed from our previous visit. The asterisk indicates that the content has been updated from the previously archived copy. If you don't see an asterisk next to an archived page, then the content on the archived page is probably identical to the previously archived copy.
How do I know the date a site was archived?
The yellow band at the top of an archived page lists the date and time when a site was captured.
Why does an archived page display today's date?
If a site contains code to calculate the current date, the current date will appear on the site regardless of the date it was actually added to the collection. You should check the yellow band at the top of an archived page for the date and time when a site was captured.
Why isn't the site I am looking for in the Website Archives?
There are a couple of reasons why the site may not be in the Website Archives.
- Content or technological reasons may impede harvest.
- The content may be "out of scope" or outside of area of capture. This is especially true for sites that link to outside content, KDLA is not responsible for capturing that external content, especially content on non-state government sites.
What types of web content cannot be harvested?
As a crawler visits a site, it will gather and organize the contents it encounters. This is known as harvesting. However, there are certain types of content that our crawler cannot harvest. These are:
Robots.txt —A site owner puts a robots.txt file on a site to keep crawlers from crawling the site. Our crawler will not harvest a site that has a robots.txt file.
Date Displays — If a site contains code to calculate the current date, the current date will appear on the site regardless of the date it was actually added to the collection. You should check the yellow band at the top of the archived site to determine the date the page was archived.
Server Side Image Maps — If the site needs to contact the originating server in order to work, it will fail when archived.
Streaming Media — This is a one-way transmission over a data network that is played as it is received and is not stored permanently on the requesting computer. While we can’t harvest streaming media, we can harvest downloadable media files.
Password Protected Sites — The crawler cannot collect any site that requires a password or that is database driven because it requires user input. This includes https sites.
Form Driven Content — If you need to fill in a form to get access to the content, the crawler typically cannot retrieve this content.
Who has access to the Website Archives?
The pages in the archives are made available to the public for use in research, teaching, and private study, pursuant to the U.S. Copyright Law. The user must assume full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials.
How can I find what I am looking for in the Website Archives?
We provide full text search capability for the Website Archives. Alternatively, if you know the site you are looking for, you can enter the url into the search box and view all instances of that archived url.
Can I download sites from the Wesite Archives?
We do not prohibit downloading from our collection, however, the user must assume full responsibility for any use of the materials, including but not limited to, infringement of copyright and publication rights of reproduced materials. Whenever materials from our collection are used in a publication or other product we request that the copy carry a credit line stating, “Courtesy of the Kentucky Department for Libraries and Archives.”
How can I make sure I am seeing as much of the archived content as possible?
You will need the following information to set up proxy mode to browse the KDLA Website Archives.
Host = wayback.archive-it.org
Port = 1632
Now, follow the link below that corresponds to the type of browser you wish to use.
After you have changed to proxy mode, open a browser window and type in the URL whose archived version you are interested in viewing. You will see the "archived website disclaimer" bar at the top of the screen. Keep in mind that (1) browsing in proxy mode will only display the most recent capture date of the website you are browsing and (2) using the settings given in this FAQ will only work with sites from the KDLA Website Archives.
To return your browser to normal, you will need to follow the same instructions used above and then unselect the option to enter proxy mode.
Why are page sections moved around or missing?
Most of the sites captured display best using either Mozilla Firefox or Internet Explorer 7, so check the page in both browswers. Download Firefox here and IE7 here. Other display issues result from frames on a Web page; in this case it is just a bug in the archives. Please note, however, that we are regularly working on resolving these issues.
What does this error message mean?
Below is a list of common error messages you may encounter while searching the archives. If you see an error message that does not have the Internet Archive Wayback Machine logo in the upper left corner, you are most likely looking at an archived error page or the live web.
Failed Connection — The server that the particular piece of information is stored on is down. Generally these errors clear up within two weeks.
Robots.txt Query Exclusion —A site owner puts a robots.txt file on a site to keep crawlers from crawling the site. Our crawler will not harvest a site that has a robots.txt file.
Blocked Site Error — Site owners or copyright holders have requested that the site be excluded from the collection. It is possible that the State Archives obtained a copy of the web site you are looking for directly from the agency without using the automated crawler. Please contact us to determine if the web site is available.
Path Index Error — A path index error message refers to a problem in our database. These errors may take time to fix. If you encounter this error message please alert us to the problem by contacting us and identifying the link that you were trying to reach and the page that you were trying to link from.
Not in Archive — The page you are trying to access is not part of the archives. Refer to this question for reasons why a site may not be included in the archives.
Why did I end up on the live web?
Why did I link to a page captured on a different date?
If you are following links from one domain to another domain, both in the collection, it is possible the new domain was captured on a different date. In that case, we will display the closest available capture date of the new domain. To make sure you know what version of the web you are viewing, pay attention to the date listed in the yellow band at the top of the archived page.
Why can’t I see the images on a site?
Most images display properly in the Website Archives. When there is a small red "x" where the image should be it means that technological issues prevented the capture of the image content. When an image is grayed out it means that the site owner used robots.txt exclusions to block access to the images directory.