• Products
  • Statistics and Data Mining Solutions
  • Statistics and Data Mining Services
  • Statistics and Data Mining Resources
  • Support
  • News and Events
  • Company
Support
Home / Support / FAQ

Search FAQ Knowledge Base

FAQ
Problem: I have data that is already normalized (rows are genes and columns are chips). How do I import this data into S+ArrayAnalyzer for Differential expression testing?
Product: S+ArrayAnalyzer 2.0 Windows
Category: General Information
Discussion

Data that has already been normalized/summarized outside of S+ArrayAnalyzer can be imported
for differential expression testing. The data will need to be imported and then put into the appropriate
object type for analysis in S-PLUS. In most cases, the data to be imported has the simple matrix
structure of genes represented in the rows and different chips represented in the columns of the data.
This type of data can be imported through the graphical user interface in windows and using the
command line in other versions. Both operations will be shown.

Import from the Graphical User Interface

In Windows it is very easy to import data using the graphical user interface. After S-PLUS is
open and the S+ArrayAnalyzer module has been loaded, select File>Import data…>From File…

Select Browse from the resulting Dialog to select your file:


Navigate to the directory that contains your data using the Select file to import dialog box.

After you have selected your file and pressed "OK" the file name and path will appear in the
"File Name:" box on the Import From File dialog. Make sure that the type of data you are
importing is correctly selected under the "Data Type:" drop down menu.


The "Data set" box will allow you to name your dataset. Simply enter a name in the space provided.
If there are existing dataframes in the current working directory the will be shown in the drop down list.
This is useful if you want to append data to an existing dataframe. In our case we want to import
new data so we will check "Create new data set" and assign a new name in the "Data set" box.

Use the update preview button to examine what the data will look like on import. Just like it sounds,
this button enables you to get a quick peak at the data before performing an entire import. This allows
you to make sure things like the column and row names are being read in and placed correctly. A
quick look at the example data shows that we need to move the first row over to make sure we
have the gene names set as the rownames in the resulting dataframe.


In order to move the gene names over into the rownames space select the "Options" Tab.
Simply select the "Row name col" setting and change it from the default value of "Auto"
to the numeric value 1.


Again, return and examine the results of selecting "Update Preview" under the "Data Specs" tab.

It now appears that our gene names our correctly placed in the row names position. Other settings
for changing the delimiter start column and row and filtering the data are also available under the
"Options" and "Filter" tabs. Those tabs will not be covered in more detail here. Simply select
"ok" and the file will import to the specified name.

Import from the Command Line

From the command line the import of data can be completed in one simple line of code
encompassing all the options seen in the GUI import. The real difference is that the final
result cannot be previewed. The command below will import the data in the same fashion
as just performed in the GUI:

filename<-importData(file="fidler.txt",type="ASCII",
     delimiter="\t",colNameRow=1,rowNamesCol=1,stringsAsFactors=F)

We can then examine the first few rows of the imported file.

> filename[1:5,1:5] 
     X20245 X20246 X20247 X30308 X30309
     212466_at 5.703759 5.865593 5.403050 5.721424 5.771488
     212467_at 8.758925 8.423956 8.769344 8.427560 8.883989
     212468_at 6.605608 5.897529 6.368558 5.168173 5.828893
     212469_at 8.470683 8.581691 8.622024 8.593048 8.593915
     212470_at 8.887343 8.188181 8.483683 8.316764 8.70789


Creation of the exprSet Object

In order to put the data into an object type supported by S+ArrayAnalyzer some additional
command line code must be configured and executed regardless of the import method.
The best object type for previously summarized and/or normalized objects is the exprSet
object. Full details on the exprSet object are available under the help menu
(Help>Available Help>arrayanalyzer) by looking under the keyword exprSet in the index.
The creation of the exprSet can be done in S-PLUS in two basic steps. First we will create
the phenoData object and then the exprSet. The phenoData object (see keyword phenoData)
describes the experimental conditions used in creating the data. More specifically, the
phenoData object describes how the columns (or chips) of the imported dataframe are
organized in the experiment. In our example we have two conditions "MUTANT" and
"WILD-TYPE" that equally divide the 42 columns (or chips) in the experiment. The following
command will create a list of factors for use in creating the phenoData object.

factorlist<-rep(c("MUT","WT"),c(21,21))

Look at the factorlist:

>factorlist
[1] "MUT" "MUT" "MUT" "MUT" "MUT" "MUT" "MUT"
[8] "MUT" "MUT" "MUT" "MUT" "MUT" "MUT" "MUT"
[15] "MUT" "MUT" "MUT" "MUT" "MUT" "MUT" "MUT"
[22] "WT" "WT" "WT" "WT" "WT" "WT" "WT"
[29] "WT" "WT" "WT" "WT" "WT" "WT" "WT"
[36] "WT" "WT" "WT" "WT" "WT" "WT" "WT"

Next we create the phenoData object and use the factorlist as the pData slot and
column names from the dataframe as the varLabels slot.

pd <- new("phenoData", pData=data.frame(factorlist), varLabels=dimnames(filename)[[2]])

Finally, to create the exprSet object:

myExprSetObj <- new("exprSet", exprs=as.matrix(filename), phenoData=pd)

This object can then be used in differential expression testing through the normal S+Arrayanalyzer
graphical user interface. An import slot to remember to set on your exprSet object is the "annotation" slot.
This slot contains character data that tells the GUI which annotation library to use. For example,
when using Affymetrix chips one could set the chip name in order to have the annotation information
available in the graphical results of any differential expression testing.

myExprSetObj@annotation<-"mgu74av2"

Quick Access Script

The above script is provided below without comments and can be cut and paste from
this web page into a script window in S-PLUS for quick editing.

     # import file
     filename<-importData(file="fidler.txt",type="ASCII",
     delimiter="\t",colNameRow=1,rowNamesCol=1,stringsAsFactors=F)
     # create factor list
     factorlist<-rep(c("MUT","WT"),c(21,21))
     # create phenoData object
     pd <- new("phenoData", pData=data.frame(factorlist), varLabels=dimnames(filename)[[2]])
     #create exprSet
     myExprSetObj <- new("exprSet", exprs=as.matrix(filename), phenoData=pd)
     # assign annotation slot
     myExprSetObj@annotation<-"mgu74av2"