Counter for tumblr
Website Stats

SEMANTIC AND PRAGMATIC

Find nth occurrence in a column

A solution to  find Nth occurence in a column coupled with INDEX/ MATCH function. 

Step 1: In Excel, Enable "Developer" tab in Excel:  

Pathway (for Mac): Preference--> Ribbons--> select "Developer" 

Step 2: Insert Macro 

Pathway: Develop--> Editor--> Insert module (copy the green text below)  

Function NthInstance(F As String, R As Range, N As Long) As String
Dim vA As Variant
vA = R.Value
For I = LBound(vA, 1) To UBound(vA, 1)
    If InStr(1, vA(I, 1), F) > 0 Then
        ct = ct + 1
        If ct = N Then
            NthInstance = R.Cells(I, 1).Address(1, 1)
          Exit Function
        End If
    End If
Next I
If ct < N Then NthInstance = "Only " & ct & " instances found"
End Function
 
Credit:  source of the VBA code 
http://www.mrexcel.com/forum/excel-questions/678792-find-nth-occurrence-column.html
 
Step 3: INDEX/INDIRECT function can then be applied to find values of Nth occurencies. 

e.g.

=INDEX($F$1:$H$10,ROW(INDIRECT(NthInstance($F29,$F$2:$F$10,$E29))),2)

Download an example sample here
 
Enjoy. 
 
Posted by Xun Liao at 14:39
Comments (0)
Visualise US LCI database in R

This is an example code to visulize US LCI database in R

library(igraph)

g1 <- graph.adjacency(B ,mode='undirected',diag=FALSE

V(g1)$label<-lp1  # set lp1 (degree of freedom) is  frequency of a process being used as input

V(g1)$label.color <- rgb(0, 0, .2, .8) # set color scheme

V(g1)$size=k*0.3  # change node size 

V(g1)$label.cex <- 1*V(g1)$size / max(V(g1)$size)+ .2  # change text size 

#split screen

split.screen(c(2,1))-> ind

#screen( ind[2])

#plot(g1,vertex.label=NA, layout=layout.auto)

screen( ind[2])

plot(g1,vertex.label=lp1, layout=layout.reingold.tilford)

plot(g1,vertex.label=NA, layout=layout.random)

For dynamic visualization, check out a network visualization (force-directed graph) of the 2012 industry-by-industry US Input-Output Table (15 sectors) provided by BEA.

http://www-personal.umich.edu/~mingxu/ionetwork12/

Posted by Xun Liao at 14:54
Comments (0)
R code for LCA database computation: ecoinvent 3.1
This is the R code to compute ecoinvent version 3.1 from SimaPro raw matrix export. The motivation for this work is to handle large database at ease. This code can be adapted for any database exported from SimaPro. It's potentially very useful if you want to compute selected impact results for databases at once. 
 
Main objectives of the current code are: 
  • extract sparse matrix (i, j, k) of ecoinvent version 3.1 for data storage 
  • compute final LCI results for all processs  
  • extract and sort impact assessment method characterization factors for ecoinvent v3.1 nomenclature as computable format 
  • compute IA results for the whole database 
 
Step 1:  Download and install free Open Source Rstudio from the URL  below: 
https://www.rstudio.com/products/rstudio/download/
 
Step 2: Export ecoinvent v3.1 database from SimaPro Dev as matrix format,  then save it as a comma-delimited CSV file.  Also, export selected impact assessment method as csv file, sperated by ";" . 
 
Step3: Apply the following R code for computation. 
Attention: text that might need to be ajusted are highlighted as yellow color.
For methodological and mathematical backgroud for this code, please refer to this document:  Mathematical formula for spatial LCA computation:  1. Site-Generic LCA
 
## start of the code
#Aug 18, 2015  Xun Liao 
 
## Section 1:  Create and compute system LCI for sparse matrix ecoinvent version 3.1 
# This code can be easily adapted for any other database or project work 
 
setwd("/Users/Desktop") # set up your own folder directory 
#read data
filename='EI31defaultUnitRaw.csv'
data<-read.table(filename,header=FALSE,sep=','); # change the file name accordingly 
 
#install libraries 
install.packages("Matrix")
library(Matrix)
 
# create reference matrix R 
R<-data[3,5:(dim(data)[2])];  #read reference matrix 
R<-as.matrix(R);  #format to matrix 
storage.mode(R)<-"double";  # convert data to 'double' format 
R[is.na(R)] <- 0;  # convert NA to zero 
R<-as.numeric(R); # format as a vector 
 
#Create matrix B
B<-data[(dim(data)[2]+3):(dim(data)[1]),5:(dim(data)[2])];
B<-as.matrix(B);
storage.mode(B)<-"double";
B[is.na(B)] <- 0;
B<-sweep(B, 2, R, "/");  # normalize matrix B 
B<-Matrix(B,sparse=TRUE);
Bsum<-summary(B);  # create location matrix (i,j,k) 
write.table(Bsum, file = "SparseEI31B.txt",sep = ",",row.names = FALSE,col.names=FALSE)  # save the matrix B as (i j k) format 
 
# create matrix A 
A<-data[7:(dim(data)[2]+2),5:(dim(data)[2])];
A<-as.matrix(A);
storage.mode(A)<-"double";
A[is.na(A)] <- 0;
# create diag A 
diagA<-diag(dim(A)[2]);
 
#create final invertible matrix A 
A<-sweep(A, 2, R, "/");
A<-(-A)+diagA; 
A<-Matrix(A,sparse=TRUE);
Asum<-summary(A);
write.table(Asum, file = "SparseEI31A.txt",sep = ",",row.names = FALSE,col.names=FALSE)
 
#create demand matrix F
F<-diag(dim(A)[2]);
 
#matrix inversion 
G<-B%*%solve(A,F);  # final LCI matrix G 
rm(F,R,Bsum,Asum,diagA) # clear variables that is not useful anymore 
 
#save the current workspace
save.image(file='EI3data.RData')
savehistory(file='EI3data.Rhistory')
 
##  Section 2: Extract and sort characterization factors of any impact assessment methodology for LCI database IA computation  
#Suppose you also have an impact assessment methodology characterization factors (e.g. IMPACT 2002+ or TRACI 2.1, ReCiPe) exported from SimaPro.  The objective here is to sort them out for ecoinvent elementary flows as computable format:
 
filename="IMPACT2002.CSV"; # for example, IMAPCT 2002+ method 
 
#read the IA csv file that is download from SimaPro
mydata <- read.delim(filename, header = FALSE, stringsAsFactors = FALSE);
 
# split data into different columns 
CF<-data.frame(unname(t((data.frame(strsplit(mydata[, 1], ";")))))); 
 
#identify start & end location for each impact category 
s<-which(CF$X1=="Impact category"); #starting location of impact categories 
s<-c(s,dim(CF)[1]); # add final row of last impact category
 
 # read impact category name 
Name<-as.matrix(CF$X1[s[1:length(s)-1]+1]);  # read impact category name 
NameUnit<-as.matrix(CF$X2[s[1:length(s)-1]+1]);  # read impact category name 
metaname<-data.frame(cbind(Name,NameUnit)) #combine indicator name and unit into one dataframe 
metaname<- do.call(paste, c(metaname[c("X1", "X2")], sep = ";")) #merge indicator name and unit into one column
 
#create matching ID for impact assessment method [compartment, ( sub-), name ]. check unit
CF$ID <- do.call(paste, c(CF[c("X1", "X3","X2")], sep = ";")) ;
 
#read elementary flow list for ecoinvent v3.1 database 
dbN<-data[(dim(data)[2]+3):(dim(data)[1]),1:4];
 
#create matching ID for inventory database;  dbN is nomenclature elementary flow name for ecoinvent v3.1 [category, sub-, name ] ; 
 
dbN$ID1 <- do.call(paste, c(dbN[c("V1", "V2","V3")], sep = ";")) ;  # sub-category senstive
dbN$ID2 <- do.call(paste, c(dbN[c("V1", "V2")],"(unspecified)", sep = ";")) ; #ingore sub-category
dbN$ID3 <- do.call(paste, c(dbN[c("V1", "V2","V3","V4")], sep = ";")) ; #meta info. 
 
 
#create an empty data frame C for storing matched characterization factors 
C<-data.frame(matrix(nrow=dim(dbN)[1], ncol=dim(Name)[1]))
m <- vector(mode="numeric", length=dim(dbN)[1])
 
# create CF matrix 
for (i in 1:(length(s)-1) ) {
  
  m1<-match(dbN$ID1,CF$ID[s[i]:s[i+1]])+s[i]-1;
  m2<-match(dbN$ID2,CF$ID[s[i]:s[i+1]])+s[i]-1;
 k<-data.frame(cbind(m1,m2,m));
 
# match sub-category value first, if sub-category value does not exist, use generic value that is based on compartment+name matching. This appraoch is consistent with SimaPro program. 
 
  k<- within(k,
             m<- ifelse(!is.na(m1),m1,m2)
             )
 
  C[i]<-CF$X5[k$m]
}
 
C<-as.matrix(C)
storage.mode(C)='double'
C[is.na(C)]=0;
names(C)=Name;  # assign impact categories name to characterization fator matrix for ecoinvent version 3.1 
print(as.matrix(colSums(C))) # check sorted CFs 
 
#format CF database
CEI<-data.frame(cbind(dbN$ID3,C))
names(CEI)=c("ElementoryFlow",metaname)
 
##  Section 3: Compute IA results , then format it for export 
#compute impact assessment results output for the database 
IA<-t(G)%*%C;
IA<-dataframe(as.matrix(IA));
 
#generate meta and output
colnames(IA)=metaname;
processname<- data.frame(t(data[1:5,4:dim(data)[2]]));
processname$X3<-NULL;
processname$X4<-NULL;
names(processname)<-as.matrix(processname[1,])
processname=processname[-1,];
 
#generate final IA results output 
output<-data.frame(cbind(processname,IA)); 
 
rm(CF,dbN,mydata,Name,s,filename) #clear variables 
 
#save the current workspace
save.image(file='EI3data.RData')
savehistory(file='EI3data.Rhistory')
 
## End of the code 
 
## My next steps are: 
  • to add contribution/ sensitivity/ supply chain analysis
  • to add uncertainty analysis
  • to visualize results 
  • to add spatialization analysis and location information (such as GIS ,etc) 
  • to extract information directly from original data provider (such as ecoinvent, NREL USLCI ,etc) 
 
Stay tuned! 
 
If you have any comments/questions, send me an email:  xun.liao@epfl.ch

 

Counter for tumblr
Website Stats

Posted by Xun Liao at 11:38
Comments (0)
RSS