Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering CUI/TUI returned entities? #516

Open
ddofer opened this issue Jun 20, 2024 · 1 comment
Open

Filtering CUI/TUI returned entities? #516

ddofer opened this issue Jun 20, 2024 · 1 comment

Comments

@ddofer
Copy link

ddofer commented Jun 20, 2024

When doing NER/NEL to UMLS/CUI entities, is there any way to configure the nlp pipe to exclude candidates by a predefined filtering list of CUIs or TUIs? e.g. to exclude any detected CUIs with TUI: T079 (Temporal Concept)?

Currently I'm doing it by post-hoc filtering, which is both inelegant, inneffecient, and doesn't help remove noisy detections. i.e., if the linker returns the first detected entity froma text, then post-hoc filtering to remove the TUI means I miss the relevant entities.

Current code extract:

`nlp.add_pipe("scispacy_linker",
config={"resolve_abbreviations": True,
"linker_name": "umls",
"max_entities_per_mention": 4, #5
"threshold":0.87 ## default is 0.8, paper mentions 0.99 as thresh
})
#...

EXCLUDE_TUIS_LIST = ["T079","T093"] #List of umls cui semtypes to exclude.

novel_cols_candidates_names = []
no_entities_list = []

novel_candidate_cuis = []
novel_candidate_cuis_nomenclatures = []
TUIs_list = []

for f in icu_feature_terms["name"]:
print(f)
doc =nlp(f)
linker = nlp.get_pipe("scispacy_linker")

if len(doc.ents)>0:
    for j,entity in enumerate(doc.ents):
        print(f"Entity #{j}:{entity}")
       
        list_feature_cuis = [i[0] for i in entity._.kb_ents]

        ## add tui filt
        s1 = len(list_feature_cuis)
        # print(s1)
        tui_filter_mask = [linker.kb.cui_to_entity[c][3][0] not in EXCLUDE_TUIS_LIST for c in list_feature_cuis]
        list_feature_cuis = list(compress(list_feature_cuis,tui_filter_mask))

     
        list_cuis_nomenclatures = [linker.kb.cui_to_entity[i[0]][1] for i in entity._.kb_ents]
        # linker = nlp.get_pipe("scispacy_linker") #ORIG
        list_cuis_nomenclatures = list(compress(list_cuis_nomenclatures,tui_filter_mask))
        
        num_candidates = len(list_feature_cuis)
        for c in list_feature_cuis:
            TUIs_list.append(linker.kb.cui_to_entity[c][3][0]) # c[0]][3][0])

            for cui in list_feature_cuis:
              novel_cols_candidates_names.extend([f]*(num_candidates))
              novel_candidate_cuis.extend(list_feature_cuis)
              novel_candidate_cuis_nomenclatures.extend(list_cuis_nomenclatures)

else:
    no_entities_list.append(f)
    print(f"No Entity candidates for {f}")

`

@dakinggg
Copy link
Collaborator

dakinggg commented Jul 2, 2024

Hi, this is not something exists right now, although is a reasonable feature request if you wanted to give implementing it a go! Otherwise, I recommend doing what you are doing and post hoc filtering (setting the threshold such that you get enough candidates after filtering)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants