Count duplicate data that satisfy conditions and delete data

Sample file uploaded to MediaFile .

Background Information

Section 1: In the sample file "Sheet1"

a.  Values in "Column A" are the original name. For example from Cell A1:
    ">hg19_refGene_NM_000392_0 range=chr10:101542463-101542634 5'pad=0 3'pad=0 strand=+ repeatMasking=none"

b.  Values in "Column B" is a value that correspond to values in Column A, for example  
    from Cell B1 which correspond to value in Cell A1: "ABCC2"  

Section 2: In the sample file "Sheet2"

a.  In the Sheet2, the values from Sheet1 have been separated to clarify the data because  
    in Sheet1, everything is packed in one cell. 

b.  Column A represents "GENE", which refers to the value in Column B in Sheet1, for example,  
    "ABCC2" from Section 1 of this article.

c.  Column B represents "refGENE", an example of refGENE is "NM000392" which come from the  
    original name from "Sheet1"

d.  Column C represents "CHROMOSOME", this is another value that was derived from Values in  
    Column A of Sheet1, for example, "chr10"

e.  Similar Idea, "EXON START" came from the original name in Column A of Sheet1, for  
    example "101542463"

f.  And "EXON END" came from the original name in Column A of Sheet1, for example "101542634"

The challenge is to develop a program that can solve the following requirements:

Requirement 1: counting for each gene, the number of times each refgen is observed, for example:

Table Example refGENE COUNT NM000927 29 NM00078 32 NM00042 32,,,.

enter image description here

Note. The way I do this is to use SUMPRODUCT in Excel, however I don't know how to put everything in a simple table.

2: . , "Sheet1". "Sheet2". , , Gene, Chromosome, EXONSTART, EXON END , refgene. .

"Shee1" "Original Name" "GENE" ,

1: , B . , 1 2 ABCC2 ABCC2. , 2, GENE .

2: "chr" , . 1 chr10, 2 chr10, , .

3: "exon start" - 101542463 1, 2 101544365, , save . , , " ", 4.

4: , " exon" , " ". 1 101542634, "exon end" 2 101544538. , , , GENE.

, , , , "GENE" , "chr" , "exon start" "endon end" . , , , . . . , 1. , ? 29 NM000927, 32 Nm00078. "GENE" , , , NM000927.

, .

+3
1

@Siddharth , .. Row Labels = GENE, ÎŖ Values ​​= Count refGene.

, "" ( , ) , A, // = () C1/tick /OK, , 35 , .

, , A ( D), > (), =COUNTIF(D:D,D2) E2 . 1= , - .

+2

All Articles