ArrayAnalyzer Data Libraries

The previous page contains zip compressed libraries for the analysis of Affymetrix Microarrays. These libraries are converted from the BioConductor Project using their metadata libraries. Original authorship is noted inside each library under the README file and on the original download pages at the BioConductor site. These files were converted for use in S-PLUS by Adam Diaz. Questions on their use should be directed to Insightful Technical Support. These libraries will be updated to the most current BioConductor version quarterly.

The main difference between the two versions (BioConductor and S-PLUS) is that the probe information is included as an object with the CDF library in the S-PLUS version whereas BioConductor maintains separate libraries. The specific contents details are noted below.

The CDF libraries are needed when working with CEL files in ArrayAnalyzer. You need to have the CDF library that matches the Affymetrix chip you are analyzing available for use by ArrayAnalyzer. The name of each library is the chip name in all lower case letters, no hyphens or underscores with the suffix "cdf" added on. E.g. if you are working with mgu74av2 chips you need to have the library mgu74av2cdf available. Each CDF library contains a named list and a dataframe, the name of the list is the same as the name of the library. The list contains the locations on the chip for the perfect and mismatch probes. ArrayAnalyzer functions will need to access this named list when doing CEL level operations. If the list is not available, ArrayAnalyzer will attempt to load the library, if it cannot find the library an error will occur. Additionally the CDF files contain a probe sequence data frame. The data frame is used by the GCRMA function to analyze GC content as a part of summarization of CEL files into single values per probe.

The annotation libraries contained named lists of annotation information for various genome databases. Each chip has its own annotation library. The name of each library is the chip name in all lower case letters, no hyphens or underscores with the suffix "AnnoData" added on. Within each library are named lists, the names of these lists are the chip name with a suffix related to the annotation data. Table 1 shows the annotation data that is available in each library.

The GenBank accession number and LocusLink annotation information is used by in ArrayAnalyzer when doing differential expression testing using the dialogs. If HTML is selected as the output, the resulting plots will contain links to GenBank and LocusLink annotation information on the Web. To create the links the annotation library for the chip being used must be installed.

The libraries can be installed in the library directory under the top level S-PLUS installation directory (run getenv("SHOME") in the S-PLUS command line to find your S-PLUS installation directory. Alternatively, you can install the libraries in any location and use the lib.loc argument to the library function when attaching them.

Installation: To install the selected library, extract the contents of the zip file into the SHOME/library directory. The library files will automatically be put into an appropriate subdirectory (e.g., hgu133bAnnoData).